[ 
https://issues.apache.org/jira/browse/HBASE-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999818#comment-16999818
 ] 

Lijin Bin edited comment on HBASE-23595 at 12/19/19 7:36 AM:
-------------------------------------------------------------

Yes, i think we need to give high priority to meta assign procedure and 
ServerCrashProcedure(carry meta), so the meta will assign quick.
{code}
2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] 
assignment.TransitRegionStateProcedure: Starting pid=23568, ppid=23567, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; 
openRegion rit=OPEN, location=100.107.165.61,60020,1576553057082; 
loc=100.107.165.61,60020,1576553057082
2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] 
zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in 
ZooKeeper as 100.107.165.61,60020,1576553057082
2019-12-18 16:14:43,515 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Finished subprocedure pid=23569, resume processing parent pid=23568, 
ppid=23567, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2019-12-18 16:14:43,518 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Finished pid=23569, ppid=23568, state=SUCCESS; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.5970sec
2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: 
Finished subprocedure pid=23568, resume processing parent pid=23567, 
state=RUNNABLE:SERVER_CRASH_GET_REGIONS, locked=true; ServerCrashProcedure 
server=100.107.165.22,60020,1576553019781, splitWal=true, meta=true
2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: 
Finished pid=23568, ppid=23567, state=SUCCESS; TransitRegionStateProcedure 
table=hbase:meta, region=1588230740, ASSIGN in 6.4630sec


2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] 
procedure2.ProcedureExecutor: Stored pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] 
procedure.MasterProcedureScheduler: Add 
ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 
size=1) to run queue because: the exclusive lock is not held by anyone when 
adding pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:15:07,212 INFO  [RegionServerTracker-0] 
assignment.AssignmentManager: Scheduled SCP pid=69619 for 
100.107.165.61,60020,1576553057082 (carryingMeta=true) 
100.107.165.61,60020,1576553057082/CRASHED/regionCount=13026/lock=java.util.concurrent.locks.ReentrantReadWriteLock@68f2ee72[Write
 locks = 1, Read locks = 0], oldState=ONLINE.
2019-12-18 16:15:21,629 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=959,queue=191,port=60000] 
master.DeadServer: Removed 100.107.165.61,60020,1576553057082, processing=true, 
numProcessing=0



2019-12-18 16:16:14,779 DEBUG [qtp1688526221-1038] 
client.ConnectionImplementation: locateRegionInMeta parentTable='hbase:meta', 
attempt=0 of 31 failed; retrying after sleep of 31
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=31, exceptions:
Wed Dec 18 16:16:14 CST 2019, null, java.net.SocketTimeoutException: 
callTimeout=60000, callDuration=81362: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3300)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
 row 'hbase:rsgroup,,99999999999999' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, hostname=100.107.165.61,60020,1576553057082, 
seqNum=-1




2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: Master server 
abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint]
2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: ***** ABORTING 
master 100.107.176.225,60000,1576656778460: FAILED persisting 
region=38d18fd824890c80cff972cbf2e4c174 state=OPENING *****
java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1286005: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
 row 'table1w,user22322,1575810347321.38d18fd824890c80cff972cbf2e4c174.' on 
table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=100.107.165.61,60020,1576553057082, seqNum=-1
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1632)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpening(AssignmentManager.java:1647)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.openRegion(TransitRegionStateProcedure.java:207)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:339)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:102)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:319)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:102)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

        at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:282)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:269)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:129)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
        ... 17 more
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
 org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:97)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:423)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)



2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: 
Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 
size=0) from run queue because: queue is empty after polling out pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: 
Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=true (69619) 
sharedLock=0 size=0) from run queue because: pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true held 
exclusive lock


2019-12-18 16:33:06,353 INFO  [PEWorker-13] 
assignment.TransitRegionStateProcedure: Starting pid=69626, ppid=69619, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; 
openRegion rit=OPEN, location=100.107.165.22,60020,1576656877749; 
loc=100.107.165.22,60020,1576656877749
2019-12-18 16:33:06,353 INFO  [PEWorker-13] zookeeper.MetaTableLocator: Setting 
hbase:meta (replicaId=0) location in ZooKeeper as 
100.107.165.22,60020,1576656877749
{code}
Looks at this logs, we can see the ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082(carry meta) queued for quit a while.


was (Author: aoxiang):
Yes, i think we need to give high priority to meta assign procedure and 
ServerCrashProcedure(carry meta), so the meta will assign quick.
{code}
2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] 
assignment.TransitRegionStateProcedure: Starting pid=23568, ppid=23567, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; 
openRegion rit=OPEN, location=100.107.165.61,60020,1576553057082; 
loc=100.107.165.61,60020,1576553057082
2019-12-18 16:14:41,698 INFO  [KeepAlivePEWorker-18] 
zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in 
ZooKeeper as 100.107.165.61,60020,1576553057082
2019-12-18 16:14:43,515 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Finished subprocedure pid=23569, resume processing parent pid=23568, 
ppid=23567, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2019-12-18 16:14:43,518 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
Finished pid=23569, ppid=23568, state=SUCCESS; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.5970sec
2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: 
Finished subprocedure pid=23568, resume processing parent pid=23567, 
state=RUNNABLE:SERVER_CRASH_GET_REGIONS, locked=true; ServerCrashProcedure 
server=100.107.165.22,60020,1576553019781, splitWal=true, meta=true
2019-12-18 16:14:43,522 INFO  [PEWorker-9] procedure2.ProcedureExecutor: 
Finished pid=23568, ppid=23567, state=SUCCESS; TransitRegionStateProcedure 
table=hbase:meta, region=1588230740, ASSIGN in 6.4630sec


2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] 
procedure2.ProcedureExecutor: Stored pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:15:07,212 DEBUG [RegionServerTracker-0] 
procedure.MasterProcedureScheduler: Add 
ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 
size=1) to run queue because: the exclusive lock is not held by anyone when 
adding pid=69619, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:15:07,212 INFO  [RegionServerTracker-0] 
assignment.AssignmentManager: Scheduled SCP pid=69619 for 
100.107.165.61,60020,1576553057082 (carryingMeta=true) 
100.107.165.61,60020,1576553057082/CRASHED/regionCount=13026/lock=java.util.concurrent.locks.ReentrantReadWriteLock@68f2ee72[Write
 locks = 1, Read locks = 0], oldState=ONLINE.
2019-12-18 16:15:21,629 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=959,queue=191,port=60000] 
master.DeadServer: Removed 100.107.165.61,60020,1576553057082, processing=true, 
numProcessing=0



2019-12-18 16:16:14,779 DEBUG [qtp1688526221-1038] 
client.ConnectionImplementation: locateRegionInMeta parentTable='hbase:meta', 
attempt=0 of 31 failed; retrying after sleep of 31
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=31, exceptions:
Wed Dec 18 16:16:14 CST 2019, null, java.net.SocketTimeoutException: 
callTimeout=60000, callDuration=81362: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:2967)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3300)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42190)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
 row 'hbase:rsgroup,,99999999999999' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, hostname=100.107.165.61,60020,1576553057082, 
seqNum=-1




2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: Master server 
abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint]
2019-12-18 16:33:04,715 ERROR [PEWorker-15] master.HMaster: ***** ABORTING 
master 100.107.176.225,60000,1576656778460: FAILED persisting 
region=38d18fd824890c80cff972cbf2e4c174 state=OPENING *****
java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1286005: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
 row 'table1w,user22322,1575810347321.38d18fd824890c80cff972cbf2e4c174.' on 
table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=100.107.165.61,60020,1576553057082, seqNum=-1
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
        at 
org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.transitStateAndUpdate(AssignmentManager.java:1632)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.regionOpening(AssignmentManager.java:1647)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.openRegion(TransitRegionStateProcedure.java:207)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:339)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.executeFromState(TransitRegionStateProcedure.java:102)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:319)
        at 
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.execute(TransitRegionStateProcedure.java:102)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
Caused by: org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

        at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99)
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:282)
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:269)
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:129)
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
        ... 17 more
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
 org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online 
on 100.107.165.61,60020,1576656916048
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3349)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3326)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2772)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42188)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)

        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:97)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:423)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)



2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: 
Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=false sharedLock=0 
size=0) from run queue because: queue is empty after polling out pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true
2019-12-18 16:33:04,728 DEBUG [PEWorker-15] procedure.MasterProcedureScheduler: 
Remove ServerQueue(100.107.165.61,60020,1576553057082, xlock=true (69619) 
sharedLock=0 size=0) from run queue because: pid=69619, 
state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082, splitWal=true, meta=true held 
exclusive lock


2019-12-18 16:33:06,353 INFO  [PEWorker-13] 
assignment.TransitRegionStateProcedure: Starting pid=69626, ppid=69619, 
state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; 
openRegion rit=OPEN, location=100.107.165.22,60020,1576656877749; 
loc=100.107.165.22,60020,1576656877749
2019-12-18 16:33:06,353 INFO  [PEWorker-13] zookeeper.MetaTableLocator: Setting 
hbase:meta (replicaId=0) location in ZooKeeper as 
100.107.165.22,60020,1576656877749
{code}
Looks at this logs, we can see the ServerCrashProcedure 
server=100.107.165.61,60020,1576553057082 queued for quit a while.

> HMaster abort when write to meta failed
> ---------------------------------------
>
>                 Key: HBASE-23595
>                 URL: https://issues.apache.org/jira/browse/HBASE-23595
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lijin Bin
>            Priority: Major
>
> RegionStateStore
> {code}
>   private void updateRegionLocation(RegionInfo regionInfo, State state, Put 
> put)
>       throws IOException {
>     try (Table table = 
> master.getConnection().getTable(TableName.META_TABLE_NAME)) {
>       table.put(put);
>     } catch (IOException e) {
>       // TODO: Revist!!!! Means that if a server is loaded, then we will 
> abort our host!
>       // In tests we abort the Master!
>       String msg = String.format("FAILED persisting region=%s state=%s",
>         regionInfo.getShortNameToLog(), state);
>       LOG.error(msg, e);
>       master.abort(msg, e);
>       throw e;
>     }
>   }
> {code}
> When regionserver (carry meta) stop or crash, if the ServerCrashProcedure 
> have not start process, write to meta will fail and abort master.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to