Hi everyone,

I would like to address a particular scenario that has recently come to my 
attention regarding the use of the PutAzureBlobStorage processor with the 
FileResourceService.

When the PutAzureBlobStorage processor is used with the FileResourceService, it 
currently uploads a file from the user's local filesystem to Azure, but it does 
not create a FlowFile. Instead, it utilizes the incoming FlowFile solely to 
send a provenance event. In this case the size of the provenance event is the 
incoming FlowFile's size instead of the uploaded one.

There are potential solutions to address this issue and ensure that the 
provenance events are handled effectively. Two main options have been proposed:


  *   Create a Mock FlowFile: A mock FlowFile with a size matching that of the 
local file being uploaded could be generated. This mock FlowFile would serve as 
the basis for the provenance event, even though its size might not reflect the 
actual content.

  *   Modify the ProvenanceReporter Interface: Alternatively, we could 
introduce a new method in the ProvenanceReporter interface that doesn't require 
a FlowFile but instead accepts a "size" parameter as an argument. This would 
eliminate the need for a mock FlowFile.

The lack of a FlowFile operation in this situation creates a distinct challenge 
because provenance events are typically tied to FlowFiles. Still, it's 
important to indicate data transmission for monitoring and tracking.

While the idea of a "size" parameter for the provenance event seems preferable, 
we need to carefully consider its feasibility, potential complexities, and 
community acceptance. The FileResourceService already deviates from NiFi's 
concept of using FlowFiles to hold payload data, and we must avoid further 
complicating the framework unless absolutely necessary.

If you have any insights or suggestions, please feel free to reply to this 
email or join the discussion.

Best Regards,
Lehel

Reply via email to