>
> Rather, I’d say it's an UPLOAD_FILE event. So I’d lean more toward an
> uploadFile() method on ProvenanceReporter that takes as an argument a
> `File` (as well as a FlowFile). The size would come from the File itself,
> and the event would convey the information about the local file that was
> uploaded - probably in the Event Details.
>

Would that mean that for the "bytes transferred" graph in Status History,
we would combine SEND and UPLOAD_FILE events? Because, right now, it's not
showing anything which is confusing.

Also, I'm not sure about the 'File' object. While we have only the local
file system as an option today for the File Resource Service, I'd expect
additional implementations such as implementations for CSPs. So we could
have the case where PutAzureBlobStorage is used with a FileResourceService
for Google Cloud Storage for example (in order to improve efficiency of
data movement between cloud providers) and, in this case, not sure we would
have a 'File' object. Unless you're talking about a more generic File
object here and not the object for local file system.

Le jeu. 26 oct. 2023 à 09:16, Matt Burgess <mattyb...@apache.org> a écrit :

> AFAIK it is fine and appropriate to issue multiple provenance events
> for a single FlowFile. In the case for PutAzureBlobStorage uploading a
> file to Azure, it is the incoming FlowFile that triggers the upload.
> Before reporting a provenance event, attributes are added to the
> FlowFile, so that "version" of the FlowFile can be the one used to
> report a SEND event. I have done this to said processor as part of a
> large refactor/improvement of the provenance capability:
>
> session.getProvenanceReporter().send(flowFile,
> blob.getSnapshotQualifiedUri().toString(), transferMillis,
> REL_SUCCESS);
>
> Having said that, to Mark's point it's probably better to have a
> separate UPLOAD_FILE event, I can change that in my code.
>
> I added a couple like this to similar processors, such as
> TriggerHiveMetastoreEvent:
>
> session.getProvenanceReporter().invokeRemoteProcess(flowFile,
> hiveMetastoreUrl, REL_SUCCESS);
>
> I am still working on this, I need to write up a Jira with a thorough
> treatment of the material and eventually get a PR up for review.
>
> Regards,
> Matt
>
> On Thu, Oct 26, 2023 at 12:02 PM Mark Payne <marka...@hotmail.com> wrote:
> >
> > Lehel,
> >
> > I don’t believe we should be trying to create a “Mock FlowFile.” I am ok
> with an update to the ProvenanceReporter interface. But I don’t think it
> should accept a “size” parameter. Rather, I think this is a completely
> different type of event that is occurring. This is not a “send” in that
> it’s not sending the contents of the FlowFile to a remote system. Rather,
> I’d say it's an UPLOAD_FILE event. So I’d lean more toward an uploadFile()
> method on ProvenanceReporter that takes as an argument a `File` (as well as
> a FlowFile). The size would come from the File itself, and the event would
> convey the information about the local file that was uploaded - probably in
> the Event Details.
> >
> > Thanks
> > -Mark
> >
> >
> > > On Oct 26, 2023, at 10:36 AM, Lehel Boér <lehe...@hotmail.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > I would like to address a particular scenario that has recently come
> to my attention regarding the use of the PutAzureBlobStorage processor with
> the FileResourceService.
> > >
> > > When the PutAzureBlobStorage processor is used with the
> FileResourceService, it currently uploads a file from the user's local
> filesystem to Azure, but it does not create a FlowFile. Instead, it
> utilizes the incoming FlowFile solely to send a provenance event. In this
> case the size of the provenance event is the incoming FlowFile's size
> instead of the uploaded one.
> > >
> > > There are potential solutions to address this issue and ensure that
> the provenance events are handled effectively. Two main options have been
> proposed:
> > >
> > >
> > >  *   Create a Mock FlowFile: A mock FlowFile with a size matching that
> of the local file being uploaded could be generated. This mock FlowFile
> would serve as the basis for the provenance event, even though its size
> might not reflect the actual content.
> > >
> > >  *   Modify the ProvenanceReporter Interface: Alternatively, we could
> introduce a new method in the ProvenanceReporter interface that doesn't
> require a FlowFile but instead accepts a "size" parameter as an argument.
> This would eliminate the need for a mock FlowFile.
> > >
> > > The lack of a FlowFile operation in this situation creates a distinct
> challenge because provenance events are typically tied to FlowFiles. Still,
> it's important to indicate data transmission for monitoring and tracking.
> > >
> > > While the idea of a "size" parameter for the provenance event seems
> preferable, we need to carefully consider its feasibility, potential
> complexities, and community acceptance. The FileResourceService already
> deviates from NiFi's concept of using FlowFiles to hold payload data, and
> we must avoid further complicating the framework unless absolutely
> necessary.
> > >
> > > If you have any insights or suggestions, please feel free to reply to
> this email or join the discussion.
> > >
> > > Best Regards,
> > > Lehel
> >
>

Reply via email to