[ https://issues.apache.org/jira/browse/HBASE-26866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-26866. ------------------------------- Fix Version/s: 3.0.0-alpha-3 Hadoop Flags: Reviewed Resolution: Fixed Merged to master. Thanks [~Xiaolin Ha] for reviewing. > Shutdown WAL may abort region server > ------------------------------------ > > Key: HBASE-26866 > URL: https://issues.apache.org/jira/browse/HBASE-26866 > Project: HBase > Issue Type: Bug > Components: wal > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > Fix For: 3.0.0-alpha-3 > > > https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt > TestSyncReplicationAcive is flaky because of we may abort the region server > when shutting down WAL. > {noformat} > 2022-03-18T04:50:37,205 WARN > [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] > master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 > reported a fatal error: > ***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: > Log rolling failed ***** > Cause: > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753 > rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting > down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = > 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953) > at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196) > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953) > at > org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316) > at > org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214) > {noformat} > The problem here is that, the removal of WAL is async, when shuttting down > the WAL, we will close the thread pool so it will throw reject execution > exception and cause region server abort. -- This message was sent by Atlassian Jira (v8.20.1#820001)