[ 
https://issues.apache.org/jira/browse/HBASE-26866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-26866.
-------------------------------
    Fix Version/s: 3.0.0-alpha-3
     Hadoop Flags: Reviewed
       Resolution: Fixed

Merged to master.

Thanks [~Xiaolin Ha] for reviewing.

> Shutdown WAL may abort region server
> ------------------------------------
>
>                 Key: HBASE-26866
>                 URL: https://issues.apache.org/jira/browse/HBASE-26866
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0-alpha-3
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt
> TestSyncReplicationAcive is flaky because of we may abort the region server 
> when shutting down WAL.
> {noformat}
> 2022-03-18T04:50:37,205 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] 
> master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 
> reported a fatal error:
> ***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: 
> Log rolling failed *****
> Cause:
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753
>  rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting 
> down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 
> 0]
>       at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
>       at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
>       at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
>       at 
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
>       at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
>       at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
>       at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
> {noformat}
> The problem here is that, the removal of WAL is async, when shuttting down 
> the WAL, we will close the thread pool so it will throw reject execution 
> exception and cause region server abort.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to