[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575920#comment-16575920
 ] 

Allan Yang commented on HBASE-20976:
------------------------------------

[~stack],[~Apache9]. Sorry that I have to reopen this again, since I find 
another case that SCP can be scheduled multiple times…… As you can see from the 
issue's description.

1. the RS is expired, and a SCP was submitted
{code}
2018-08-09 12:29:55,665 WARN  [PEWorker-11] master.ServerManager: Expiration of 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:55,665 INFO  [PEWorker-11] master.ServerManager: Processing 
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on 
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573

2018-08-09 12:29:55,815 DEBUG [PEWorker-11] assignment.AssignmentManager: 
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted 
shutdown handler to be executed meta=false
{code}

2. the RS restarted on the same host, and the servername is removed from the 
deadserver's list
{code}
2018-08-09 12:29:58,034 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=157,queue=13,port=16000] 
master.ServerManager: REPORT: Server 
izbp1azj9xjvk1h9vioyvfz,16020,1533787086010 came back up, removed it fro
m the dead servers list
{code}

3. Another UnassinProcedure detect this one too, since it thinks no one is 
handling it, a SCP is submitted again
{code}
2018-08-09 12:29:58,061 WARN  [PEWorker-15] 
assignment.RegionTransitionProcedure: Remote call failed 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_D
ISPATCH, hasLock=true; UnassignProcedure table=randowmWrite15, 
region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645; rit=CLOSING, 
location=izbp1azj9xjvk1h9vioyvfz
,16020,1533725024975; exception=NoServerDispatchException
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedur
e table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645
        at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:263)
        at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:207)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
        at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
2018-08-09 12:29:58,061 WARN  [PEWorker-15] assignment.UnassignProcedure: 
Expiring izbp1azj9xjvk1h9vioyvfz,16020,1533725024975, pid=4034, ppid=4012, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=t
rue; UnassignProcedure table=randowmWrite15, 
region=e07c5ad01ce7b76b80a92e809fb98e26, 
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645 rit=CLOSING, 
location=izbp1azj9xjvk1h9vioyvfz,16020,153372502497
5; exception=NoServerDispatchException
2018-08-09 12:29:58,061 WARN  [PEWorker-15] master.ServerManager: Expiration of 
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:58,061 INFO  [PEWorker-15] master.ServerManager: Processing 
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on 
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573
2018-08-09 12:29:58,540 DEBUG [PEWorker-15] assignment.AssignmentManager: 
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted 
shutdown handler to be executed meta=false
{code}

> SCP can be scheduled multiple times for the same RS
> ---------------------------------------------------
>
>                 Key: HBASE-20976
>                 URL: https://issues.apache.org/jira/browse/HBASE-20976
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 2.0.2
>
>         Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to