[ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576095#comment-16576095
 ] 

Duo Zhang commented on HBASE-20976:
-----------------------------------

I think we'd better do it a bit clean without adding too much checks...

I think here we need to make sure that the deadServers check can work and 
prevent scheduling redundant SCPs. We can do the SCPs check when restarting is 
that, we have not started the PE yet so it is safe, but during the execution, 
this is not a good idea as there is no fencing...

> SCP can be scheduled multiple times for the same RS
> ---------------------------------------------------
>
>                 Key: HBASE-20976
>                 URL: https://issues.apache.org/jira/browse/HBASE-20976
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>             Fix For: 2.0.2
>
>         Attachments: HBASE-20976.branch-2.0.001.patch, 
> HBASE-20976.branch-2.0.002.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to