[ 
https://issues.apache.org/jira/browse/HBASE-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282374#comment-16282374
 ] 

Yi Liang commented on HBASE-19287:
----------------------------------

See the log below:
{code}
2017-12-07 19:01:45,218 INFO  [ProcExecWrkr-1] procedure.RecoverMetaProcedure: 
pid=17, state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure 
failedMetaServer=null, splitWal=true; Retaining meta assignment to 
server=hadoop-slave1.hadoop,16020,1512673261766
2017-12-07 19:01:45,227 INFO  [ProcExecWrkr-1] procedure2.ProcedureExecutor: 
Initialized subprocedures=[{pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
region=1588230740, target=hadoop-slave1.hadoop,16020,1512673261766}]
2017-12-07 19:01:45,261 INFO  [ProcExecWrkr-3] 
procedure.MasterProcedureScheduler: pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
region=1588230740, target=hadoop-slave1.hadoop,16020,1512673261766 hbase:meta 
hbase:meta,,1.1588230740
2017-12-07 19:01:45,266 INFO  [ProcExecWrkr-3] assignment.AssignProcedure: 
Start pid=18, ppid=17, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta, region=1588230740, 
target=hadoop-slave1.hadoop,16020,1512673261766; rit=OFFLINE, 
location=hadoop-slave1.hadoop,16020,1512673261766; forceNewPlan=false, 
retain=false
2017-12-07 19:01:45,419 INFO  [ProcExecWrkr-2] zookeeper.MetaTableLocator: 
Setting hbase:meta (replicaId=0) location in ZooKeeper as 
hadoop-slave2.hadoop,16020,1512673268932
2017-12-07 19:01:45,426 INFO  [ProcExecWrkr-2] 
assignment.RegionTransitionProcedure: Dispatch pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
region=1588230740, target=hadoop-slave1.hadoop,16020,1512673261766; 
rit=OPENING, location=hadoop-slave2.hadoop,16020,1512673268932
2017-12-07 19:01:45,580 INFO  [ProcedureDispatcherTimeoutThread] 
procedure.RSProcedureDispatcher: Using procedure batch rpc execution for 
serverName=hadoop-slave2.hadoop,16020,1512673268932 version=2097152
2017-12-07 19:01:46,793 INFO  [main-EventThread] zookeeper.RegionServerTracker: 
RegionServer ephemeral node deleted, processing expiration 
[hadoop-slave2.hadoop,16020,1512673268932]
2017-12-07 19:01:46,793 INFO  [main-EventThread] master.ServerManager: Master 
doesn't enable ServerShutdownHandler during initialization, delay expiring 
server hadoop-slave2.hadoop,16020,1512673268932
{code}

*Usually Master will hangs as above log, and the assign procedure will become 
'dead'
The patch will notice and wake the meta assign procedure, and the procedure 
become active and run as below *

{code}
2017-12-07 19:01:46,794 INFO  [main-EventThread] master.ServerManager: Meta has 
been assigned to crashed server: hadoop-slave2.hadoop,16020,1512673268932; will 
do re-assign
2017-12-07 19:01:46,794 WARN  [main-EventThread] 
assignment.RegionTransitionProcedure: Remote call failed pid=18, ppid=17, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
region=1588230740, target=hadoop-slave1.hadoop,16020,1512673261766; 
rit=OPENING, location=hadoop-slave2.hadoop,16020,1512673268932; 
exception=ServerCrashProcedure pid=18, 
server=hadoop-slave2.hadoop,16020,1512673268932
2017-12-07 19:01:46,797 INFO  [main-EventThread] assignment.AssignProcedure: 
Retry=1 of max=10; pid=18, ppid=17, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
AssignProcedure table=hbase:meta, region=1588230740, 
target=hadoop-slave1.hadoop,16020,1512673261766; rit=OPENING, 
location=hadoop-slave2.hadoop,16020,1512673268932
2017-12-07 19:01:46,798 INFO  [ProcExecWrkr-4] assignment.AssignProcedure: 
Start pid=18, ppid=17, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure 
table=hbase:meta, region=1588230740; rit=OFFLINE, location=null; 
forceNewPlan=true, retain=false
{code}

> master hangs forever if RecoverMeta send assign meta region request to target 
> server fail
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-19287
>                 URL: https://issues.apache.org/jira/browse/HBASE-19287
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.0.0
>            Reporter: Yi Liang
>            Assignee: Yi Liang
>         Attachments: master.patch
>
>
> 2017-11-10 19:26:56,019 INFO  [ProcExecWrkr-1] 
> procedure.RecoverMetaProcedure: pid=138, 
> state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure 
> failedMetaServer=null, splitWal=true; Retaining meta assignment to 
> server=hadoop-slave1.hadoop,16020,1510341981454
> 2017-11-10 19:26:56,029 INFO  [ProcExecWrkr-1] procedure2.ProcedureExecutor: 
> Initialized subprocedures=[{pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454}]
> 2017-11-10 19:26:56,067 INFO  [ProcExecWrkr-2] 
> procedure.MasterProcedureScheduler: pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454 hbase:meta 
> hbase:meta,,1.1588230740
> 2017-11-10 19:26:56,071 INFO  [ProcExecWrkr-2] assignment.AssignProcedure: 
> Start pid=139, ppid=138, state=RUNNABLE:REGION_TRANSITION_QUEUE; 
> AssignProcedure table=hbase:meta, region=1588230740, 
> target=hadoop-slave1.hadoop,16020,1510341981454; rit=OFFLINE, 
> location=hadoop-slave1.hadoop,16020,1510341981454; forceNewPlan=false, 
> retain=false
> 2017-11-10 19:26:56,224 INFO  [ProcExecWrkr-4] zookeeper.MetaTableLocator: 
> Setting hbase:meta (replicaId=0) location in ZooKeeper as 
> hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:56,230 INFO  [ProcExecWrkr-4] 
> assignment.RegionTransitionProcedure: Dispatch pid=139, ppid=138, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
> region=1588230740, target=hadoop-slave1.hadoop,16020,1510341981454; 
> rit=OPENING, location=hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:56,382 INFO  [ProcedureDispatcherTimeoutThread] 
> procedure.RSProcedureDispatcher: Using procedure batch rpc execution for 
> serverName=hadoop-slave2.hadoop,16020,1510341988652 version=2097152
> 2017-11-10 19:26:57,542 INFO  [main-EventThread] 
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, 
> processing expiration [hadoop-slave2.hadoop,16020,1510341988652]
> 2017-11-10 19:26:57,543 INFO  [main-EventThread] master.ServerManager: Master 
> doesn't enable ServerShutdownHandler during initialization, delay expiring 
> server hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:26:58,875 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Registering 
> server=hadoop-slave1.hadoop,16020,1510342016106
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Registering 
> server=hadoop-slave2.hadoop,16020,1510342023184
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Triggering server recovery; existingServer 
> hadoop-slave2.hadoop,16020,1510341988652 looks stale, new 
> server:hadoop-slave2.hadoop,16020,1510342023184
> 2017-11-10 19:27:05,832 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> master.ServerManager: Master doesn't enable ServerShutdownHandler during 
> initialization, delay expiring server hadoop-slave2.hadoop,16020,1510341988652
> 2017-11-10 19:27:49,815 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] 
> client.RpcRetryingCallerImpl: tarted=38594 ms ago, cancelled=false, 
> msg=org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not 
> online on hadoop-slave2.hadoop,16020,1510342023184
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3290)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1370)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2401)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41544)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>  row 'hbase:namespace' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=hadoop-slave2.hadoop,16020,1510341988652, seqNum=0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to