[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290374#comment-13290374
 ] 

Flavio Junqueira commented on BOOKKEEPER-237:
---------------------------------------------

{quote}
- builds a list of fragments it is participating in
- from this list, build a bookie -> fragment index
{quote}

I'm not sure what you mean here. Do you mean to say a list of ledgers the 
bookie is participating in? Perhaps a concrete example would help. Say we have 
a e3-q2 ledger. Each bookie will be watching all others? If so, for a large 
number of ledgers, we might end up having all bookies watching everyone else, 
since we are likely to have every pair of bookies together in at least one 
ledger.

I was also thinking that once we detect the crash, we need to decide where to 
rebuild each ledger fragment. An elected auditor might enable a simpler way to 
ensure that the load is evenly balanced across bookies. In a distributed 
manner, it might not be simple and I would have to think about an algorithm if 
no one else has one at hand. 

                
> Automatic recovery of under-replicated ledgers and its entries
> --------------------------------------------------------------
>
>                 Key: BOOKKEEPER-237
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-237
>             Project: Bookkeeper
>          Issue Type: New Feature
>          Components: bookkeeper-client, bookkeeper-server
>    Affects Versions: 4.0.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: Auto Recovery Detection - distributed chain 
> approach.doc, Auto Recovery and Bookie sync-ups.pdf
>
>
> As per the current design of BookKeeper, if one of the BookKeeper server 
> dies, there is no automatic mechanism to identify and recover the under 
> replicated ledgers and its corresponding entries. This would lead to losing 
> the successfully written entries, which will be a critical problem in 
> sensitive systems. This document is trying to describe few proposals to 
> overcome these limitations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to