[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294542#comment-13294542
 ] 

Ivan Kelly commented on BOOKKEEPER-272:
---------------------------------------

{quote}
bq. I don't think we need the bookie

Here I could see one race condition. Say first Auditor is coming to publish 
failure of BK2 in L0001. Meantime BK4 has finished the re-replication of BK3's 
L0001 and about to delete the entry from /underreplicated. In this case, 
Auditor will silently continues by seeing L0001 and the other worker will 
delete the L0001 entry thinking there is no more failures.

Solution I'm thinking to check the data version before doing zk 
operation(similar logic we built in BKJM CurrentInProgress). I'm planning to 
keep data as failed bookie information.
{quote}
Yes, I think in this case, when we see BK3's failure and L0001 already exists, 
we should bump the version number. We shouldn't really be changing any vital 
data in zookeeper without checking the version number anyhow. Hopefully this 
will be a very rare situation anyhow, in a 3e2q ledger, two machines dropping 
like this would probably mean data loss.

{quote}
Yeah, I understand. I'm having one suggestion, anyway auditor knows about the 
failed bookies and its ledgers when publishing the underreplicated ledgers. Why 
don't we keep the failed bookie as data inside the underreplicated ledger. So 
the worker(segment checker) only looks to this bookie and get corresponding 
index directly from the ZK ledger metadata?.{quote}
Im not sure what you mean here. Having the failed bookie stored in the data is 
useful for debugging purposes, but we should do a check on the ledger 
beforehand anyhow to determine what to recover. Are you trying to avoid another 
read to the ledger znode?

                
> Provide automatic mechanism to know bookie failures
> ---------------------------------------------------
>
>                 Key: BOOKKEEPER-272
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-272
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-server
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: BOOKKEEPER-272.1.patch, BOOKKEEPER-272.2.patch, 
> BOOKKEEPER-272.Auditor.patch
>
>
> The idea is to build automatic mechanism to find out the bookie failures. 
> Setup the bookie failure notifications to start the re-replication process.
> There are multiple approaches to findout bookie failures. Please refer the 
> documents attached in BookKeeper-237.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to