[ https://issues.apache.org/jira/browse/HBASE-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547553#comment-14547553 ]
Lars Hofhansl commented on HBASE-13703: --------------------------------------- Every time we ship a _chunk_ of edits to the sinks. That is after we: # the made the {{entries}} ArrayList # read through the WALs, added them all as references to {{entries}} In ReplicationEndpoint we then: # copied all {{entries}} to a new WAL.Entry[] # ship the WAL.Entry[] to the sinks # applied the edits to the sink So that one object won't be a problem. :) But what will likely be a problem is the fact that all the WAL entries we read and added to the {{entries}} array now cannot be GC'd until we replicate something successfully the next time. We could achieve the same by calling {{replicateContext.setEntries(null);}} in a finally block, after we successfully replicated. But frankly avoiding the allocation here looks like a premature optimization. > ReplicateContext should not be a member of ReplicationSource > ------------------------------------------------------------ > > Key: HBASE-13703 > URL: https://issues.apache.org/jira/browse/HBASE-13703 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Priority: Minor > Attachments: 13703.txt > > > The ReplicateContext object is created once per ReplicationSource and then > reused when we have something to ship to the sinks. > This is a misguided optimization. ReplicateContext is very lightweight > (definitely compared to the all the work and copying the ReplicationSource is > doing) and, crucially, it prevent the the entries array from being collected > after it was successfully copied to the sink, wasting potentially a lot of > heap. > The entries array itself holds reference to WAL entries on the heap, that now > also cannot be collected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)