[ https://issues.apache.org/jira/browse/NIFI-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936853#comment-14936853 ]
Mark Payne commented on NIFI-1008: ---------------------------------- Removed Fix Version of 0.4.0. Looking into the implementation details, this could be extremely complex. This is because the FlowFile Repository must be updated atomically with all FlowFiles that are processed in a single session. In order to do this, we need all of the FlowFiles to be passed as a single Collection. Otherwise, if we restart in the middle of a session commit, some of the updates will have taken place but not all of them. This could cause some really odd behavior. One possible solution is to modify the FlowFile Repository's definition to allow an Iterator to be passed instead of a Collection, and then we can implement an iterator that deserializes the objects as needed. > NiFi should swap out FlowFiles to disk even before the session is committed > --------------------------------------------------------------------------- > > Key: NIFI-1008 > URL: https://issues.apache.org/jira/browse/NIFI-1008 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Reporter: Mark Payne > > Currently, NiFi will swap out FlowFiles if there are a large number in a > FlowFile Queue. This is done to avoid running out of JVM heap space. However, > if we have a simple flow like GetFile -> SplitText and GetFile pulls in a > large file, SplitText can quickly cause OutOfMemoryError. This is not because > it buffers the content of the FlowFile in memory but rather because it holds > the millions of FlowFile objects in memory. We can do better. > When we call session.transfer for the FlowFiles, once we hit a magical > threshold (say 10,000), we should swap those FlowFiles to disk and the > session should transfer them to the queue "swapped out" flowfiles, rather > than having to buffer all of these in memory and then swapping them out once > they land in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)