[
https://issues.apache.org/jira/browse/NIFI-14896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018444#comment-18018444
]
Mark Bean commented on NIFI-14896:
----------------------------------
It is not possible to create a RECEIVE provenance event followed by a FORK
event within ListenHTTP as originally proposed. The reason is that the FFv3
package which is received is never made a FlowFile; it unpacks the contained
FlowFiles on the fly. Creating a FlowFile out of the received bytes would
require double-writing data to disk in the content repo. First, the FFv3
package would be written, then the individual FlowFiles would be written.
(Possibly, the FFv3 package would be read from disk as well - another operation
that is bad for performance.)
Instead, it is acceptable to continue to record only a RECEIVE event for each
contained FlowFile in the package and no FORK events. However, the "Source
FlowFile Id" should reflect the UUID of the FFv3 package, not the UUID of the
original, unpackaged FlowFile. By doing so, the linkage from the source
system's FFv3 package FlowFile to the destination system's unpacked FlowFile is
retained.
Additional details: in the original proposal for RECEIVE followed by FORK, the
details of the RECEIVE event would only be fully accurate if a FlowFile is
actually created for the FFv3 package. Bypassing its creation (for efficiency)
but creating an zero-content FlowFile just as a placeholder to be used to
generate a provenance RECEIVE event results in the event inaccurately
representing the file size as zero bytes.
> Provide more accurate Provenance events in ListenHTTP
> -----------------------------------------------------
>
> Key: NIFI-14896
> URL: https://issues.apache.org/jira/browse/NIFI-14896
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Mark Bean
> Priority: Major
>
> ListenHTTP will automatically unpack a received packaged FlowFile. The result
> is one or many FlowFiles will be created based on the package contents. For
> example, if the received packaged FlowFile contains two FlowFiles within it,
> two FlowFiles will be created by ListenHTTP.
> Currently, ListenHTTP generates a RECEIVE event for each of the created
> FlowFiles. This is incorrect as it removes the association of the child
> FlowFiles to the original FlowFile package (parent). In the case described
> above, it should generate a single RECEIVE event for the packaged FlowFile
> and an additional FORK event for the unpacked FlowFiles contained within the
> package.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)