Rushabh Shah created HBASE-28932:
------------------------------------

             Summary: Abort RS if unable to sync internal markers.
                 Key: HBASE-28932
                 URL: https://issues.apache.org/jira/browse/HBASE-28932
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 2.5.8
            Reporter: Rushabh Shah


RS kept on running even if it was unable to write replication marker. But this 
issue is not specific to just replication marker. It applies to compaction 
marker as well as region event marker (like open, close).
Sample exception trace:
{noformat}
2024-10-09 10:12:21,659 ERROR [regionserver/regionserver-33:60020.Chore.3] 
regionserver.ReplicationMarkerChore - Exception whil
e sync'ing replication tracker edit
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 300000 ms for txid=15030132, WAL system stuck?
        at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:171)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:876)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:802)
        at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doSync(FSHLog.java:836)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$sync$3(AbstractFSWAL.java:602)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.sync(AbstractFSWAL.java:602)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.sync(AbstractFSWAL.java:592)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullMarkerAppendTransaction(WALUtil.java:169)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:146)
        at 
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeReplicationMarkerAndSync(WALUtil.java:230)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationMarkerChore.chore(ReplicationMarkerChore.java:99)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:161)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to