[ https://issues.apache.org/jira/browse/HBASE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134099#comment-17134099 ]
Guanghao Zhang commented on HBASE-24548: ---------------------------------------- I thought the issue title is not right. This is not related to SCP. > improvement for HBase SCP > ------------------------- > > Key: HBASE-24548 > URL: https://issues.apache.org/jira/browse/HBASE-24548 > Project: HBase > Issue Type: Improvement > Reporter: Junhong Xu > Assignee: Junhong Xu > Priority: Major > > In our internal hbase based on branch-2.1 in community, we find after the > regionserver is stopped about 30 s later, the master find it dead finally > from its ephemeral node deleted in zk. During this time, the regions on this > server is unavailable and no progress. The log is as follows: > {code:java} > [2020-06-12 15:51:41.888 > ActorThreadPool-consumer-processor-talos-set-alias-55-1 ERROR > c.x.xmpush.hbase.utils.HBaseHelper] [get data hbase failed, tableName = > mipush:app_alias_new] > com.xiaomi.infra.hbase.client.HException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=10, exceptions: > Fri Jun 12 15:50:44 CST 2020, > org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, > org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: > org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server > c3-hadoop-srv-st639.bj,13700,1591932264018 stopping > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > Fri Jun 12 15:50:44 CST 2020, > org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, > org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: > org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server > c3-hadoop-srv-st639.bj,13700,1591932264018 stopping > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > {code} > The logs in master: > {code:java} > 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] > org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral > node deleted, processing expiration > [c3-hadoop-srv-st639.bj,13700,1591932264018] > 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] > org.apache.hadoop.hbase.master.ServerManager: Processing expiration of > c3-hadoop-srv-st639.bj,13700,1591932264018 on > c3-hadoop-miui-zk05.bj,13600,1591927126881 > 2020-06-12,15:51:12,109 INFO [RegionServerTracker-0] > org.apache.hadoop.hbase.master.assignment.AssignmentManager: Added > c3-hadoop-srv-st639.bj,13700,1591932264018 to dead servers which > carryingMeta=false, submitted ServerCrashProcedure pid=97428 > 2020-06-12,15:51:12,109 INFO > [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-c3-hadoop-miui-zk05.bj,13600,1591927126881] > > org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread: > Updating default servers. > 2020-06-12,15:51:12,111 INFO [PEWorker-11] > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start > pid=97428, state=RUNNABLE:SERVER_CRASH_START, locked=true; > ServerCrashProcedure server=c3-hadoop-srv-st639.bj,13700,1591932264018, > splitWal=true, meta=false > {code} > After discussion with [~zghao] offline, we could accelerate this process by > sending the message to the master or deleting the ephemeral node itself > before stop. -- This message was sent by Atlassian Jira (v8.3.4#803005)