Emilio Setiadarma created NIFI-12700:
----------------------------------------

             Summary: PutKudu memory optimization for unbatched flush mode 
(AUTO_FLUSH_SYNC)
                 Key: NIFI-12700
                 URL: https://issues.apache.org/jira/browse/NIFI-12700
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Emilio Setiadarma
            Assignee: Emilio Setiadarma


The PutKudu processor's existing implementation uses a Map of KuduOperation -> 
FlowFile  to keep track of which FlowFile was processing when the KuduOperation 
was created. This is mapping is eventually used to associate FlowFiles with the 
RowError (if any occurs), a mapping that is necessary for transferring 
FlowFiles to success/failure relationships or logging failures among other 
things. 

For very large inputs, Kudu Operation objects can grow very large. There is no 
memory leak, but still could cause OutOfMemory issues in very large input data. 
There is a possibility to not require the use of a KuduOperation -> FlowFile 
map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC flush mode, 
where the KuduSession.apply() would have already flushed the buffer before 
returning, 
[https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]

This Jira attempts to capture the efforts for refactoring PutKudu processor to 
make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to