LiangJun He created HBASE-27029: ----------------------------------- Summary: When HMaster is stopped, the HMaster local region cache cannot be flushed normally Key: HBASE-27029 URL: https://issues.apache.org/jira/browse/HBASE-27029 Project: HBase Issue Type: Bug Components: master Affects Versions: 3.0.0-alpha-3 Reporter: LiangJun He Assignee: LiangJun He Fix For: 3.0.0-alpha-3
After HBASE-26951, HMaster can be stoped gracefully. For example, the internal threads of HMaster can be closed normally, but I found that the local region of HMaster still cannot be closed normally. The following is my test error message: {code:java} Wed May 11 14:48:56 CST 2022 Terminating master 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@4f4c789f 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] master.HMaster: ***** STOPPING master 'emr-header-1.cluster-xxxxx,16000,1652240899395' ***** 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] master.HMaster: STOPPED: Shutdown hook ...... ...... 2022-05-11 14:48:57,367 ERROR [KeepAlivePEWorker-41] assignment.RegionStateStore: FAILED persisting region=23a692981e91e944d380a8bdf4b50c7e state=OPEN org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to address=emr-worker-1.cluster-xxxxx:16020 failed on local exception: org.apache.hadoop.hbase.ipc.StoppedRpcClientException at java.lang.Thread.getStackTrace(Thread.java:1559) at org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:172) at org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) at --------Future.get--------(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:210) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:422) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:417) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:443) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$300(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$RpcChannelImplementation.callMethod(AbstractRpcClient.java:614) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$Stub.mutate(ClientProtos.java:46147) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$mutate$0(RawAsyncTableImpl.java:175) at org.apache.hadoop.hbase.client.ConnectionUtils.call(ConnectionUtils.java:616) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.mutate(RawAsyncTableImpl.java:174) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.voidMutate(RawAsyncTableImpl.java:181) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$null$8(RawAsyncTableImpl.java:249) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.call(AsyncSingleRequestRpcRetryingCaller.java:82) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.lambda$doCall$7(AsyncSingleRequestRpcRetryingCaller.java:115) at org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778) at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140) at org.apache.hadoop.hbase.util.FutureUtils.addListener(FutureUtils.java:61) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.doCall(AsyncSingleRequestRpcRetryingCaller.java:106) at org.apache.hadoop.hbase.client.AsyncRpcRetryingCaller.lambda$tryScheduleRetry$0(AsyncRpcRetryingCaller.java:142) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.ipc.StoppedRpcClientException at org.apache.hadoop.hbase.ipc.AbstractRpcClient.getConnection(AbstractRpcClient.java:360) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:440) ... 23 more 2022-05-11 14:48:57,367 WARN [KeepAlivePEWorker-41] assignment.RegionRemoteProcedureBase: Failed updating meta, suspend 1secs pid=1793261, ppid=1791968, state=RUNNABLE, hasLock=true; OpenRegionProcedure 23a692981e91e944d380a8bdf4b50c7e, server=emr-worker-2.cluster-xxxx,16020,1652240898881, retry=org.apache.hadoop.hbase.util.RetryCounter@2f42033b; state=OPEN, location=emr-worker-2.cluster-18941,16020,1652240898881, table=usertable, region=23a692981e91e944d380a8bdf4b50c7e; org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to address=emr-worker-1.cluster-18941:16020 failed on local exception: org.apache.hadoop.hbase.ipc.StoppedRpcClientException at java.lang.Thread.getStackTrace(Thread.java:1559) at org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:172) at org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) at --------Future.get--------(Unknown Source) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:210) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:422) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:417) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:443) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$300(AbstractRpcClient.java:92) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$RpcChannelImplementation.callMethod(AbstractRpcClient.java:614) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$Stub.mutate(ClientProtos.java:46147) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$mutate$0(RawAsyncTableImpl.java:175) at org.apache.hadoop.hbase.client.ConnectionUtils.call(ConnectionUtils.java:616) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.mutate(RawAsyncTableImpl.java:174) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.voidMutate(RawAsyncTableImpl.java:181) at org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$null$8(RawAsyncTableImpl.java:249) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.call(AsyncSingleRequestRpcRetryingCaller.java:82) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.lambda$doCall$7(AsyncSingleRequestRpcRetryingCaller.java:115) at org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778) at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140) at org.apache.hadoop.hbase.util.FutureUtils.addListener(FutureUtils.java:61) at org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.doCall(AsyncSingleRequestRpcRetryingCaller.java:106) at org.apache.hadoop.hbase.client.AsyncRpcRetryingCaller.lambda$tryScheduleRetry$0(AsyncRpcRetryingCaller.java:142) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.ipc.StoppedRpcClientException at org.apache.hadoop.hbase.ipc.AbstractRpcClient.getConnection(AbstractRpcClient.java:360) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:440) ... 23 more ..... ..... 2022-05-11 14:49:13,232 INFO [master/emr-header-1:16000] region.MasterRegion: Closing local region {ENCODED => 1595e783b53d99cd5eef43b6debb2682, NAME => 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY => ''}, isAbort=true 2022-05-11 14:49:13,237 INFO [master/emr-header-1:16000] regionserver.HRegion: Closing region master:store,,1.1595e783b53d99cd5eef43b6debb2682. 2022-05-11 14:49:13,243 ERROR [master/emr-header-1:16000] regionserver.HRegion: Memstore data size is 139232 in region master:store,,1.1595e783b53d99cd5eef43b6debb2682. 2022-05-11 14:49:13,243 INFO [master/emr-header-1:16000] regionserver.HRegion: Closed master:store,,1.1595e783b53d99cd5eef43b6debb2682. 2022-05-11 14:49:13,377 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:251) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:144) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3209) {code} Finally HMaster exit due to abort, the local region cache cannot be flushed normally successfully. I also found another error that also causes the same problem: {code:java} 2022-05-11 23:57:03,228 INFO [master/emr-header-1:16000] hbase.HBaseServerBase: Shutdown executor service 2022-05-11 23:57:03,229 INFO [master/emr-header-1:16000] procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher 2022-05-11 23:57:03,229 INFO [master/emr-header-1:16000] procedure2.ProcedureExecutor: Stopping 2022-05-11 23:57:05,481 WARN [master/emr-header-1:16000] procedure2.StoppableThread: Waiting termination of thread PEWorker-7, 2.2510 sec; sending interrupt 2022-05-11 23:57:05,481 ERROR [PEWorker-7] assignment.RegionStateStore: FAILED persisting region=7ff63515f9ce9d876618e3f1bee7dc03 state=OPEN java.io.InterruptedIOException at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:170) at org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) Caused by: java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:168) ... 11 more 2022-05-11 23:57:05,482 WARN [PEWorker-7] assignment.RegionRemoteProcedureBase: Failed updating meta, suspend 1secs pid=1845211, ppid=1844964, state=RUNNABLE, hasLock=true; OpenRegionProcedure 7ff63515f9 ce9d876618e3f1bee7dc03, server=emr-worker-2.cluster-18941,16020,1652263346995, retry=org.apache.hadoop.hbase.util.RetryCounter@4a8e233; state=OPEN, location=emr-worker-2.cluster-18941,16020,1652263346995, table=usertable, region=7ff63515f9ce9d876618e3f1bee7dc03; java.io.InterruptedIOException at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:170) at org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259) at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981) Caused by: java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:168) ... 11 more {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)