Lars, If you are worried about it, using ReplaceText will have the same effect as your custom solution. When ReplaceText has it's `Replacement Strategy` set to `Always Replace` it doesn't read the contents of the FlowFile and simply writes out the replacement Value, which in your case could be an empty string.
Thanks, Peter From: Lars Winderling <lars.winderl...@posteo.de> Sent: Wednesday, July 31, 2019 11:02 AM To: dev@nifi.apache.org Subject: [EXT] Re: Duplicate flow files *without* their content Hi Edward, thank you for your input. I didn't know about the cow-semantics, that's really useful. I'll check out the in-depth guide for sure! In my case, the content of the flow file does change heavily from one processor to the next one, so I doubt copy-on-write would help here. Best, Lars On Wed, 2019-07-31 at 12:13 +0100, Edward Armes wrote: HI Lars, In short. depending on the how a FlowFile is duplicated, the content shouldn't be duplicated as well. In general, content is only duplicated when it has been deemed to have been changed (copy-on-write semantics). For the most part (unless a FlowFIle has a large number of attributes) a FlowFile is actually quite small and therefore the waste is minimal, hence why they can be held in memory and passed through a Flow. The best way to branch/clone a flow file is to add another output from the processor you want to log the output from, and the Framework that surrounds a Processor will handle the rest. This does create a duplicate FlowFIle but doesn't create a copy of the content. In the provenance repository this marked as a CLONE event for the original FlowFIle and the new FlowFile gets treated as it's own unique FlowFIle with a reference to the original content. This is quite a short explanation, and a better and more in depth explanation can be found here and I think this covers all the scenarios you're thinking about: <https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html . Edward On Wed, Jul 31, 2019 at 11:47 AM Lars Winderling < <mailto:lars.winderl...@posteo.de> lars.winderl...@posteo.de<mailto:lars.winderl...@posteo.de> > wrote: Dear NiFi community, I often face the use-case where I import flow files with content of order O(1gb) or O(10gb) - already compressed. Let's day I need to branch off of a flow where the actual flow file should be processed further, and one some side branch I want just to do some kind of logging or whatever without accessing the flow file's contents. Thus it's clearly wasteful to duplicate the flow file including content. For this case I wrote a processor defining 2 relationships: "original" and "attributes only", so the flow file attributes can be accessed separately from the content. I will gladly prepare a PR if anyone finds that worth incorporating into NiFi. Only remaining question for me would be: use an individual processor to that end, or add it to e.g. the DuplicateFlowFile processor. The former seems cleaner to me. Proposed names would be something like ForkProcessor (no better idea yet). Thanks in advance! Best, Lars