Longping Jie created HBASE-29786:
------------------------------------

             Summary: The replication source totalBufferUsed fails to be 
released, causing replication blocking
                 Key: HBASE-29786
                 URL: https://issues.apache.org/jira/browse/HBASE-29786
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 2.6.2
            Reporter: Longping Jie


 Cluster A turns on replication to cluster B, in order to control the rate of 
replication, in the ReplicationSourceManager class, the atomic variable 
totalBufferUsed is added, and the acquireBufferQuota method and the 
releaseBufferQuota method are provided to support the operation of adding or 
subtracting atomic variables. The value increased by the totalBufferUsed 
variable is not deducted accordingly, and the totalBufferUsed always exceeds 
the totalBufferLimit, resulting in a dead loop, and the stack information is as 
follows:

"RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.shipperhbase-3%2C16020%2C1754317255615,hbaseOnline"
 #738204104 daemon prio=5 os_prio=0 tid=0x0000000049d84800 nid=0x14ce2 waiting 
on condition [0x00007f01feceb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007f17f0679610> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at 
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.poll(ReplicationSourceWALReader.java:313)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.poll(SerialReplicationSourceWALReader.java:35)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:109)

"RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.wal-reader.hbase-3%2C16020%2C1754317255615,hbaseOnline"
 #738204105 daemon prio=5 os_prio=0 tid=0x0000000049df0000 nid=0x14ce1 waiting 
on condition [0x00007f024f6f7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:125)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.checkBufferQuota(ReplicationSourceWALReader.java:279)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:149)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:35)

error log:
2025-12-18T15:43:21,817 WARN  
[RS_REFRESH_PEER-regionserver/hbase-3:16020-0.replicationSource,hbaseOnline.replicationSource.wal-reader.hbase-3%2C16020%2C1754317255615,hbaseOnline]
 regionserver.ReplicationSourceManager: peer=hbaseOnline, can't read more edits 
from WAL as buffer usage 268445954B exceeds limit 268435456B



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to