Lehel, I don’t believe we should be trying to create a “Mock FlowFile.” I am ok with an update to the ProvenanceReporter interface. But I don’t think it should accept a “size” parameter. Rather, I think this is a completely different type of event that is occurring. This is not a “send” in that it’s not sending the contents of the FlowFile to a remote system. Rather, I’d say it's an UPLOAD_FILE event. So I’d lean more toward an uploadFile() method on ProvenanceReporter that takes as an argument a `File` (as well as a FlowFile). The size would come from the File itself, and the event would convey the information about the local file that was uploaded - probably in the Event Details.
Thanks -Mark > On Oct 26, 2023, at 10:36 AM, Lehel Boér <[email protected]> wrote: > > Hi everyone, > > I would like to address a particular scenario that has recently come to my > attention regarding the use of the PutAzureBlobStorage processor with the > FileResourceService. > > When the PutAzureBlobStorage processor is used with the FileResourceService, > it currently uploads a file from the user's local filesystem to Azure, but it > does not create a FlowFile. Instead, it utilizes the incoming FlowFile solely > to send a provenance event. In this case the size of the provenance event is > the incoming FlowFile's size instead of the uploaded one. > > There are potential solutions to address this issue and ensure that the > provenance events are handled effectively. Two main options have been > proposed: > > > * Create a Mock FlowFile: A mock FlowFile with a size matching that of the > local file being uploaded could be generated. This mock FlowFile would serve > as the basis for the provenance event, even though its size might not reflect > the actual content. > > * Modify the ProvenanceReporter Interface: Alternatively, we could > introduce a new method in the ProvenanceReporter interface that doesn't > require a FlowFile but instead accepts a "size" parameter as an argument. > This would eliminate the need for a mock FlowFile. > > The lack of a FlowFile operation in this situation creates a distinct > challenge because provenance events are typically tied to FlowFiles. Still, > it's important to indicate data transmission for monitoring and tracking. > > While the idea of a "size" parameter for the provenance event seems > preferable, we need to carefully consider its feasibility, potential > complexities, and community acceptance. The FileResourceService already > deviates from NiFi's concept of using FlowFiles to hold payload data, and we > must avoid further complicating the framework unless absolutely necessary. > > If you have any insights or suggestions, please feel free to reply to this > email or join the discussion. > > Best Regards, > Lehel
