Lehel,

I don’t believe we should be trying to create a “Mock FlowFile.” I am ok with 
an update to the ProvenanceReporter interface. But I don’t think it should 
accept a “size” parameter. Rather, I think this is a completely different type 
of event that is occurring. This is not a “send” in that it’s not sending the 
contents of the FlowFile to a remote system. Rather, I’d say it's an 
UPLOAD_FILE event. So I’d lean more toward an uploadFile() method on 
ProvenanceReporter that takes as an argument a `File` (as well as a FlowFile). 
The size would come from the File itself, and the event would convey the 
information about the local file that was uploaded - probably in the Event 
Details. 

Thanks
-Mark


> On Oct 26, 2023, at 10:36 AM, Lehel Boér <[email protected]> wrote:
> 
> Hi everyone,
> 
> I would like to address a particular scenario that has recently come to my 
> attention regarding the use of the PutAzureBlobStorage processor with the 
> FileResourceService.
> 
> When the PutAzureBlobStorage processor is used with the FileResourceService, 
> it currently uploads a file from the user's local filesystem to Azure, but it 
> does not create a FlowFile. Instead, it utilizes the incoming FlowFile solely 
> to send a provenance event. In this case the size of the provenance event is 
> the incoming FlowFile's size instead of the uploaded one.
> 
> There are potential solutions to address this issue and ensure that the 
> provenance events are handled effectively. Two main options have been 
> proposed:
> 
> 
>  *   Create a Mock FlowFile: A mock FlowFile with a size matching that of the 
> local file being uploaded could be generated. This mock FlowFile would serve 
> as the basis for the provenance event, even though its size might not reflect 
> the actual content.
> 
>  *   Modify the ProvenanceReporter Interface: Alternatively, we could 
> introduce a new method in the ProvenanceReporter interface that doesn't 
> require a FlowFile but instead accepts a "size" parameter as an argument. 
> This would eliminate the need for a mock FlowFile.
> 
> The lack of a FlowFile operation in this situation creates a distinct 
> challenge because provenance events are typically tied to FlowFiles. Still, 
> it's important to indicate data transmission for monitoring and tracking.
> 
> While the idea of a "size" parameter for the provenance event seems 
> preferable, we need to carefully consider its feasibility, potential 
> complexities, and community acceptance. The FileResourceService already 
> deviates from NiFi's concept of using FlowFiles to hold payload data, and we 
> must avoid further complicating the framework unless absolutely necessary.
> 
> If you have any insights or suggestions, please feel free to reply to this 
> email or join the discussion.
> 
> Best Regards,
> Lehel

Reply via email to