[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558487#comment-13558487
 ] 

Lars Hofhansl commented on HBASE-2611:
--------------------------------------

Specifically, check out ReplicatoinSourceManager.NodeFailoverWorker.run().
First all surviving RSs race to obtain the lock:
{code}
      if (!zkHelper.lockOtherRS(rsZnode)) {
        return;
      }
{code}
Only one RS will continue to move the failed RS's regions.

I think what we could do is this:
If multi is supported we just have all surviving RSs attempt to move the queues 
(don't bother with the lock step). If multi is as atomic as advertised that 
should work and only one of the RS will succeed to move the queues atomically, 
but all will try.
It seems like that should work.
                
> Handle RS that fails while processing the failure of another one
> ----------------------------------------------------------------
>
>                 Key: HBASE-2611
>                 URL: https://issues.apache.org/jira/browse/HBASE-2611
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Jean-Daniel Cryans
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.5
>
>         Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch
>
>
> HBASE-2223 doesn't manage region servers that fail while doing the transfer 
> of HLogs queues from other region servers that failed. Devise a reliable way 
> to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to