[ https://issues.apache.org/jira/browse/HBASE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-27778: ----------------------------- Description: When we read a new WAL Entry in {{ReplicationSourceWALReader.readWALEntries}}, we add {{ReplicationSourceWALReader.totalBufferUsed}} by the size of new entry in {{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}} may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL Entry), and the {{ReplicationSourceWALReader. totalBufferUsed}} is not decreased in this case. Because the {{ReplicationSourceWALReader. totalBufferUsed}} is actually scoped to {{ReplicationSourceManager}}, after a long run, replication to all peers may hang up. (was: When we read a new WAL Entry in {{ReplicationSourceWALReader.readWALEntries}}, we add {{ReplicationSourceWALReader. totalBufferUsed}} by the size of new entry in {{ReplicationSourceWALReader.addEntryToBatch}}, but the whole {{WALEntryBatch}} may not be put to the {{ReplicationSourceWALReader.entryBatchQueue}} because of exception(eg. exception thrown by {{WALEntryFilter.filter}} for following WAL Entry), but the {{ReplicationSourceWALReader. totalBufferUsed}} is not decreased and because the {{ReplicationSourceWALReader. totalBufferUsed}} is scoped to {{ReplicationSourceManager}}, after a long run, replication to all peers may hang up.) > Incorrect ReplicationSourceWALReader. totalBufferUsed may cause replication > hang up > ------------------------------------------------------------------------------------ > > Key: HBASE-27778 > URL: https://issues.apache.org/jira/browse/HBASE-27778 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.6.0, 3.0.0-alpha-3 > Reporter: chenglei > Priority: Major > > When we read a new WAL Entry in > {{ReplicationSourceWALReader.readWALEntries}}, we add > {{ReplicationSourceWALReader.totalBufferUsed}} by the size of new entry in > {{ReplicationSourceWALReader.addEntryToBatch}}, but the whole > {{WALEntryBatch}} may not be put to the > {{ReplicationSourceWALReader.entryBatchQueue}} because of exception(eg. > exception thrown by {{WALEntryFilter.filter}} for following WAL Entry), and > the {{ReplicationSourceWALReader. totalBufferUsed}} is not decreased in this > case. Because the {{ReplicationSourceWALReader. totalBufferUsed}} is > actually scoped to {{ReplicationSourceManager}}, after a long run, > replication to all peers may hang up. -- This message was sent by Atlassian Jira (v8.20.10#820010)