[ 
https://issues.apache.org/jira/browse/HBASE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554789#comment-13554789
 ] 

Lars Hofhansl commented on HBASE-2611:
--------------------------------------

Yeah, I don't know.

But what can happen is that the region server who wins the race to take over 
the dead region server's queues could die before it even manages to call multi. 
In the case - since the ephemeral znode is only removed once - we won't ever 
retry to move that region server's queues again. Right?
So another part of the puzzle is to have a way to retry the takeover later. 
Back in the comments here there are various suggestions about how to do that 
mostly centering around having all surviving RSs try to move a dead RS's queues.

                
> Handle RS that fails while processing the failure of another one
> ----------------------------------------------------------------
>
>                 Key: HBASE-2611
>                 URL: https://issues.apache.org/jira/browse/HBASE-2611
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Jean-Daniel Cryans
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.94.5
>
>         Attachments: HBase-2611-upstream-v1.patch, HBASE-2611-v2.patch
>
>
> HBASE-2223 doesn't manage region servers that fail while doing the transfer 
> of HLogs queues from other region servers that failed. Devise a reliable way 
> to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to