[
https://issues.apache.org/jira/browse/HBASE-27029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang updated HBASE-27029:
------------------------------
Fix Version/s: 4.0.0-alpha-1
(was: 3.0.0-beta-2)
> When HMaster is stopped, the local region cannot be flushed normally
> --------------------------------------------------------------------
>
> Key: HBASE-27029
> URL: https://issues.apache.org/jira/browse/HBASE-27029
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 3.0.0-alpha-3
> Reporter: Liangjun He
> Assignee: Liangjun He
> Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> After HBASE-26951, HMaster can be stoped gracefully. For example, the
> internal threads of HMaster can be closed normally, but I found that the
> local region of HMaster still cannot be closed normally.
> The following is my test error message(exception 1):
> {code:java}
> Wed May 11 14:48:56 CST 2022 Terminating master
> 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] regionserver.ShutdownHook:
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@4f4c789f
> 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] master.HMaster: *****
> STOPPING master 'emr-header-1.cluster-xxxxx,16000,1652240899395' *****
> 2022-05-11 14:48:56,382 INFO [shutdown-hook-0] master.HMaster: STOPPED:
> Shutdown hook
> ......
> ......
> 2022-05-11 14:48:57,367 ERROR [KeepAlivePEWorker-41]
> assignment.RegionStateStore: FAILED persisting
> region=23a692981e91e944d380a8bdf4b50c7e state=OPEN
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to
> address=emr-worker-1.cluster-xxxxx:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException
> at java.lang.Thread.getStackTrace(Thread.java:1559)
> at
> org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130)
> at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149)
> at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:172)
> at
> org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> at --------Future.get--------(Unknown Source)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:210)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:422)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:417)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:443)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$300(AbstractRpcClient.java:92)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$RpcChannelImplementation.callMethod(AbstractRpcClient.java:614)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$Stub.mutate(ClientProtos.java:46147)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$mutate$0(RawAsyncTableImpl.java:175)
> at
> org.apache.hadoop.hbase.client.ConnectionUtils.call(ConnectionUtils.java:616)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.mutate(RawAsyncTableImpl.java:174)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.voidMutate(RawAsyncTableImpl.java:181)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$null$8(RawAsyncTableImpl.java:249)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.call(AsyncSingleRequestRpcRetryingCaller.java:82)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.lambda$doCall$7(AsyncSingleRequestRpcRetryingCaller.java:115)
> at
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
> at
> java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
> at
> org.apache.hadoop.hbase.util.FutureUtils.addListener(FutureUtils.java:61)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.doCall(AsyncSingleRequestRpcRetryingCaller.java:106)
> at
> org.apache.hadoop.hbase.client.AsyncRpcRetryingCaller.lambda$tryScheduleRetry$0(AsyncRpcRetryingCaller.java:142)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.ipc.StoppedRpcClientException
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.getConnection(AbstractRpcClient.java:360)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:440)
> ... 23 more
> 2022-05-11 14:48:57,367 WARN [KeepAlivePEWorker-41]
> assignment.RegionRemoteProcedureBase: Failed updating meta, suspend 1secs
> pid=1793261, ppid=1791968, state=RUNNABLE, hasLock=true; OpenRegionProcedure
> 23a692981e91e944d380a8bdf4b50c7e,
> server=emr-worker-2.cluster-xxxx,16020,1652240898881,
> retry=org.apache.hadoop.hbase.util.RetryCounter@2f42033b; state=OPEN,
> location=emr-worker-2.cluster-18941,16020,1652240898881, table=usertable,
> region=23a692981e91e944d380a8bdf4b50c7e;
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to
> address=emr-worker-1.cluster-18941:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException
> at java.lang.Thread.getStackTrace(Thread.java:1559)
> at
> org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130)
> at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149)
> at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:172)
> at
> org.apache.hadoop.hbase.client.TableOverAsyncTable.put(TableOverAsyncTable.java:214)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:259)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:224)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:2034)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:297)
> at
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:57)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
> at --------Future.get--------(Unknown Source)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:210)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:422)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:417)
> at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
> at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:443)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$300(AbstractRpcClient.java:92)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$RpcChannelImplementation.callMethod(AbstractRpcClient.java:614)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$Stub.mutate(ClientProtos.java:46147)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$mutate$0(RawAsyncTableImpl.java:175)
> at
> org.apache.hadoop.hbase.client.ConnectionUtils.call(ConnectionUtils.java:616)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.mutate(RawAsyncTableImpl.java:174)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.voidMutate(RawAsyncTableImpl.java:181)
> at
> org.apache.hadoop.hbase.client.RawAsyncTableImpl.lambda$null$8(RawAsyncTableImpl.java:249)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.call(AsyncSingleRequestRpcRetryingCaller.java:82)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.lambda$doCall$7(AsyncSingleRequestRpcRetryingCaller.java:115)
> at
> org.apache.hadoop.hbase.util.FutureUtils.lambda$addListener$0(FutureUtils.java:68)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> at
> java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
> at
> java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
> at
> org.apache.hadoop.hbase.util.FutureUtils.addListener(FutureUtils.java:61)
> at
> org.apache.hadoop.hbase.client.AsyncSingleRequestRpcRetryingCaller.doCall(AsyncSingleRequestRpcRetryingCaller.java:106)
> at
> org.apache.hadoop.hbase.client.AsyncRpcRetryingCaller.lambda$tryScheduleRetry$0(AsyncRpcRetryingCaller.java:142)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
> at
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.ipc.StoppedRpcClientException
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.getConnection(AbstractRpcClient.java:360)
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:440)
> ... 23 more
> .....
> .....
> 2022-05-11 14:49:13,232 INFO [master/emr-header-1:16000]
> region.MasterRegion: Closing local region {ENCODED =>
> 1595e783b53d99cd5eef43b6debb2682, NAME =>
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY
> => ''}, isAbort=true
> 2022-05-11 14:49:13,237 INFO [master/emr-header-1:16000]
> regionserver.HRegion: Closing region
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2022-05-11 14:49:13,243 ERROR [master/emr-header-1:16000]
> regionserver.HRegion: Memstore data size is 139232 in region
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2022-05-11 14:49:13,243 INFO [master/emr-header-1:16000]
> regionserver.HRegion: Closed master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2022-05-11 14:49:13,377 ERROR [main] master.HMasterCommandLine: Master exiting
> java.lang.RuntimeException: HMaster Aborted
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:251)
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:144)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:144)
> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3209) {code}
>
> Finally HMaster exit due to abort, the local region cache cannot be flushed
> normally.
> I also found NPE when HMaster is stopped(exception 2):
> {code:java}
> 2022-06-01 00:14:13,180 INFO [master/emr-header-1:16000]
> region.RegionProcedureStore: Stopping the Region Procedure Store,
> isAbort=false
> 2022-06-01 00:14:13,181 WARN [master/emr-header-1:16000]
> master.ActiveMasterManager: Failed get of master address:
> java.io.IOException: Can't get master address from ZooKeeper; znode data ==
> null
> 2022-06-01 00:14:13,184 ERROR
> [RpcServer.priority.RWQ.Fifo.write.handler=0,queue=0,port=16000]
> ipc.RpcServer: Unexpected throwable object
> java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportTransition(AssignmentManager.java:1186)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.updateRegionTransition(AssignmentManager.java:1156)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportRegionStateTransition(AssignmentManager.java:1072)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportRegionStateTransition(AssignmentManager.java:1117)
> at
> org.apache.hadoop.hbase.master.MasterRpcServices.reportRegionStateTransition(MasterRpcServices.java:1772)
> at
> org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:17755)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104)
> at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)