Duo Zhang created HBASE-26866:
---------------------------------

             Summary: Shutdown WAL may abort region server
                 Key: HBASE-26866
                 URL: https://issues.apache.org/jira/browse/HBASE-26866
             Project: HBase
          Issue Type: Bug
            Reporter: Duo Zhang


https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt

TestSyncReplicationAcive is flaky because of we may abort the region server 
when shutting down WAL.
{noformat}
2022-03-18T04:50:37,205 WARN  
[RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] 
master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 
reported a fatal error:
***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: 
Log rolling failed *****
Cause:
java.util.concurrent.RejectedExecutionException: Task 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753
 rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting down, 
pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
        at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
        at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
        at 
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
        at 
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
{noformat}

The problem here is that, the removal of WAL is async, when shuttting down the 
WAL, we will close the thread pool so it will throw reject execution exception 
and cause region server abort.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to