Longping Jie created HBASE-29786:
------------------------------------
Summary: The replication source totalBufferUsed fails to be
released, causing replication blocking
Key: HBASE-29786
URL: https://issues.apache.org/jira/browse/HBASE-29786
Project: HBase
Issue Type: Bug
Components: Replication
Affects Versions: 2.6.2
Reporter: Longping Jie
Cluster A turns on replication to cluster B, in order to control the rate of
replication, in the ReplicationSourceManager class, the atomic variable
totalBufferUsed is added, and the acquireBufferQuota method and the
releaseBufferQuota method are provided to support the operation of adding or
subtracting atomic variables. The value increased by the totalBufferUsed
variable is not deducted accordingly, and the totalBufferUsed always exceeds
the totalBufferLimit, resulting in a dead loop, and the stack information is as
follows:
"RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.shipperhbase-3%2C16020%2C1754317255615,hbaseOnline"
#738204104 daemon prio=5 os_prio=0 tid=0x0000000049d84800 nid=0x14ce2 waiting
on condition [0x00007f01feceb000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f17f0679610> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.poll(ReplicationSourceWALReader.java:313)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.poll(SerialReplicationSourceWALReader.java:35)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:109)
"RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.wal-reader.hbase-3%2C16020%2C1754317255615,hbaseOnline"
#738204105 daemon prio=5 os_prio=0 tid=0x0000000049df0000 nid=0x14ce1 waiting
on condition [0x00007f024f6f7000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:125)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.checkBufferQuota(ReplicationSourceWALReader.java:279)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:149)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:35)
error log:
2025-12-18T15:43:21,817 WARN
[RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.wal-reader.hbase-3%2C16020%2C1754317255615,hbaseOnline]
regionserver.ReplicationSourceManager: peer=hbaseOnline, can't read more edits
from WAL as buffer usage 268445954B exceeds limit 268435456B
--
This message was sent by Atlassian Jira
(v8.20.10#820010)