[ https://issues.apache.org/jira/browse/NIFI-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Payne updated NIFI-4794: ----------------------------- Status: Patch Available (was: Open) > Improve Garbage Collection required by Provenance Repository > ------------------------------------------------------------ > > Key: NIFI-4794 > URL: https://issues.apache.org/jira/browse/NIFI-4794 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > > The EventIdFirstSchemaRecordWriter that is used by the provenance repository > has a writeRecord(ProvenanceEventRecord) method. Within this method, it > serializes the given record into a byte array by serializing to a > ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once > this is done, it calls toByteArray() on that BAOS so that it can write the > byte[] directly to another OutputStream. > This can create a rather large amount of garbage to be collected. We can > improve this significantly: > # Instead of creating a new ByteArrayOutputStream each time, create a pool > of them. This avoids constantly having to garbage collect them. > # If said BAOS grows beyond a certain size, we should not return it to the > pool because we don't want to keep a huge impact on the heap. > # Instead of wrapping the BAOS in a new DataOutputStream, the > DataOutputStream should be pooled/recycled as well. Since it must create an > internal byte[] for the writeUTF method, this can save a significant amount > of garbage. > # Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use > ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that > new array/copying the data, and the GC overhead. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)