[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284505#comment-13284505
 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-237:
------------------------------------------------

For work assignment, how about competing for getting the replication work. We 
already using this approach for Hbase for distributed log splitting. Idea is 
like below,

Current distributed chain of watchers can identify the failure nodes and add at 
some place in ZK. All bookies can watch on that node. Whenever new failure node 
added, bookeies will get notification and they can start competing to get the 
work. Winner will take the replication work. Also they can update the state of 
the replication under that aquired lock node. If cluster restarts, Again 
bookies can participate in competetion to get the Failed nodes replication 
work. Whenever replication completes, they can delete the lock entry and failed 
bookie entry from ZK. Infact, in Hbase we have master co-ordination. But here 
we will be depending on distributed watching to identify filed bookies. 
@Rakesh/Flavio how about your thoughts on this? 
                
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-237
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: bookkeeper-client, bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: Auto Recovery Detection - distributed chain 
> approach.doc, Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server 
> dies, there is no automatic mechanism to identify and recover the under 
> replicated ledgers and its corresponding entries. This would lead to losing 
> the successfully written entries, which will be a critical problem in 
> sensitive systems. This document is trying to describe few proposals to 
> overcome these limitations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to