[ 
https://issues.apache.org/jira/browse/HBASE-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134099#comment-17134099
 ] 

Guanghao Zhang commented on HBASE-24548:
----------------------------------------

I thought the issue title is not right. This is not related to SCP.

> improvement for HBase SCP
> -------------------------
>
>                 Key: HBASE-24548
>                 URL: https://issues.apache.org/jira/browse/HBASE-24548
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Junhong Xu
>            Assignee: Junhong Xu
>            Priority: Major
>
> In our internal hbase based on branch-2.1 in community, we find after the 
> regionserver is stopped about 30 s later, the master find it dead finally 
> from its ephemeral node deleted in zk. During this time, the regions on this 
> server is unavailable and no progress. The log is as follows:
> {code:java}
> [2020-06-12 15:51:41.888 
> ActorThreadPool-consumer-processor-talos-set-alias-55-1 ERROR 
> c.x.xmpush.hbase.utils.HBaseHelper] [get data hbase failed, tableName = 
> mipush:app_alias_new]
> com.xiaomi.infra.hbase.client.HException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=10, exceptions:
> Fri Jun 12 15:50:44 CST 2020, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
> c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> Fri Jun 12 15:50:44 CST 2020, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@2dc1865, 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 
> c3-hadoop-srv-st639.bj,13700,1591932264018 stopping
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1551)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2565)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> The logs in master:
> {code:java}
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] 
> org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral 
> node deleted, processing expiration 
> [c3-hadoop-srv-st639.bj,13700,1591932264018]
> 2020-06-12,15:51:12,003 INFO [RegionServerTracker-0] 
> org.apache.hadoop.hbase.master.ServerManager: Processing expiration of 
> c3-hadoop-srv-st639.bj,13700,1591932264018 on 
> c3-hadoop-miui-zk05.bj,13600,1591927126881
> 2020-06-12,15:51:12,109 INFO [RegionServerTracker-0] 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: Added 
> c3-hadoop-srv-st639.bj,13700,1591932264018 to dead servers which 
> carryingMeta=false, submitted ServerCrashProcedure pid=97428
> 2020-06-12,15:51:12,109 INFO 
> [org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread-c3-hadoop-miui-zk05.bj,13600,1591927126881]
>  
> org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl$ServerEventsListenerThread:
>  Updating default servers.
> 2020-06-12,15:51:12,111 INFO [PEWorker-11] 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start 
> pid=97428, state=RUNNABLE:SERVER_CRASH_START, locked=true; 
> ServerCrashProcedure server=c3-hadoop-srv-st639.bj,13700,1591932264018, 
> splitWal=true, meta=false
> {code}
> After discussion with [~zghao] offline, we could accelerate this process by 
> sending the message to the master or deleting the ephemeral node itself 
> before stop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to