Rushabh Shah created HBASE-28932:
------------------------------------
Summary: Abort RS if unable to sync internal markers.
Key: HBASE-28932
URL: https://issues.apache.org/jira/browse/HBASE-28932
Project: HBase
Issue Type: Bug
Components: wal
Affects Versions: 2.5.8
Reporter: Rushabh Shah
RS kept on running even if it was unable to write replication marker. But this
issue is not specific to just replication marker. It applies to compaction
marker as well as region event marker (like open, close).
Sample exception trace:
{noformat}
2024-10-09 10:12:21,659 ERROR [regionserver/regionserver-33:60020.Chore.3]
regionserver.ReplicationMarkerChore - Exception whil
e sync'ing replication tracker edit
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
result after 300000 ms for txid=15030132, WAL system stuck?
at
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:171)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:876)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:802)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog.doSync(FSHLog.java:836)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$sync$3(AbstractFSWAL.java:602)
at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.sync(AbstractFSWAL.java:602)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.sync(AbstractFSWAL.java:592)
at
org.apache.hadoop.hbase.regionserver.wal.WALUtil.doFullMarkerAppendTransaction(WALUtil.java:169)
at
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:146)
at
org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeReplicationMarkerAndSync(WALUtil.java:230)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationMarkerChore.chore(ReplicationMarkerChore.java:99)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:161)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)