HI Lars, In short. depending on the how a FlowFile is duplicated, the content shouldn't be duplicated as well.
In general, content is only duplicated when it has been deemed to have been changed (copy-on-write semantics). For the most part (unless a FlowFIle has a large number of attributes) a FlowFile is actually quite small and therefore the waste is minimal, hence why they can be held in memory and passed through a Flow. The best way to branch/clone a flow file is to add another output from the processor you want to log the output from, and the Framework that surrounds a Processor will handle the rest. This does create a duplicate FlowFIle but doesn't create a copy of the content. In the provenance repository this marked as a CLONE event for the original FlowFIle and the new FlowFile gets treated as it's own unique FlowFIle with a reference to the original content. This is quite a short explanation, and a better and more in depth explanation can be found here and I think this covers all the scenarios you're thinking about: https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html. Edward On Wed, Jul 31, 2019 at 11:47 AM Lars Winderling <lars.winderl...@posteo.de> wrote: > Dear NiFi community, > > I often face the use-case where I import flow files with content of order > O(1gb) or O(10gb) – already compressed. > Let's day I need to branch off of a flow where the actual flow file should > be processed further, and one some side branch I want just to do some kind > of logging or whatever without accessing the flow file's contents. Thus > it's clearly wasteful to duplicate the flow file including content. > For this case I wrote a processor defining 2 relationships: "original" and > "attributes only", so the flow file attributes can be accessed separately > from the content. > I will gladly prepare a PR if anyone finds that worth incorporating into > NiFi. > Only remaining question for me would be: use an individual processor to > that end, or add it to e.g. the DuplicateFlowFile processor. The former > seems cleaner to me. Proposed names would be something like ForkProcessor > (no better idea yet). > > Thanks in advance! > Best, > Lars >