Duo Zhang created HBASE-26866:
---------------------------------
Summary: Shutdown WAL may abort region server
Key: HBASE-26866
URL: https://issues.apache.org/jira/browse/HBASE-26866
Project: HBase
Issue Type: Bug
Reporter: Duo Zhang
https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt
TestSyncReplicationAcive is flaky because of we may abort the region server
when shutting down WAL.
{noformat}
2022-03-18T04:50:37,205 WARN
[RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877]
master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859
reported a fatal error:
***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859:
Log rolling failed *****
Cause:
java.util.concurrent.RejectedExecutionException: Task
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753
rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting down,
pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
at
org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
{noformat}
The problem here is that, the removal of WAL is async, when shuttting down the
WAL, we will close the thread pool so it will throw reject execution exception
and cause region server abort.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)