[ https://issues.apache.org/jira/browse/NIFI-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352823#comment-16352823 ]
ASF GitHub Bot commented on NIFI-4794: -------------------------------------- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2437#discussion_r166090089 --- Diff: nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/EncryptedSchemaRecordReader.java --- @@ -23,42 +23,29 @@ import java.io.InputStream; import java.util.Collection; import java.util.Optional; -import java.util.concurrent.TimeUnit; + import org.apache.nifi.provenance.schema.LookupTableEventRecord; import org.apache.nifi.provenance.toc.TocReader; import org.apache.nifi.repository.schema.Record; import org.apache.nifi.stream.io.LimitingInputStream; import org.apache.nifi.stream.io.StreamUtils; -import org.apache.nifi.util.timebuffer.LongEntityAccess; -import org.apache.nifi.util.timebuffer.TimedBuffer; -import org.apache.nifi.util.timebuffer.TimestampedLong; import org.slf4j.Logger; import org.slf4j.LoggerFactory; public class EncryptedSchemaRecordReader extends EventIdFirstSchemaRecordReader { private static final Logger logger = LoggerFactory.getLogger(EncryptedSchemaRecordReader.class); - private static final int DEFAULT_DEBUG_FREQUENCY = 1_000_000; --- End diff -- Are changes to this file part of the PR? Doesn't seem like it. Or is it additional cleanup, or should it be restored? > Improve Garbage Collection required by Provenance Repository > ------------------------------------------------------------ > > Key: NIFI-4794 > URL: https://issues.apache.org/jira/browse/NIFI-4794 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > Fix For: 1.6.0 > > > The EventIdFirstSchemaRecordWriter that is used by the provenance repository > has a writeRecord(ProvenanceEventRecord) method. Within this method, it > serializes the given record into a byte array by serializing to a > ByteArrayOutputStream (after wrapping the BAOS in a DataOutputStream). Once > this is done, it calls toByteArray() on that BAOS so that it can write the > byte[] directly to another OutputStream. > This can create a rather large amount of garbage to be collected. We can > improve this significantly: > # Instead of creating a new ByteArrayOutputStream each time, create a pool > of them. This avoids constantly having to garbage collect them. > # If said BAOS grows beyond a certain size, we should not return it to the > pool because we don't want to keep a huge impact on the heap. > # Instead of wrapping the BAOS in a new DataOutputStream, the > DataOutputStream should be pooled/recycled as well. Since it must create an > internal byte[] for the writeUTF method, this can save a significant amount > of garbage. > # Avoid calling ByteArrayOutputStream.toByteArray(). We can instead just use > ByteArrayOutputStream.writeTo(OutputStream). This avoids both allocating that > new array/copying the data, and the GC overhead. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)