[ https://issues.apache.org/jira/browse/BOOKKEEPER-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533772#comment-13533772 ]
Rakesh R commented on BOOKKEEPER-355: ------------------------------------- Thanks guys for the good progress. Sorry for pitching late, I'm busy with our internal releases and other schedules. {quote} c1: ensemble N => (A, B, C) is changed to N => (A, B, D) and M => (A, B, E) (N < M) c2: ensemble N => (A, B, C) is changed to N => (A, B, E) {quote} I've gone through the latest code. As Ivan mentioned, concurrent closes have Ledger#resolveConflicts() mechanism and would help to handle most of the cases. I could also see one possible case of concurrency and what you guys feel about the following race condition: 1) Assume, c1 & c2 both initiates recovery and sees the failure of C. 2) Say c1 would try replacing C with D and at same time c2 would try replacing C with E. 3) Again assume, c1 & c2 sends the metadata update request simulataneously. Consider c1 would succeeded and c2 recevies the metadataversion exception. Now c2 will enters into LedgerHandle#resolveconflicts(), here its checking whether C (failed bookie) exists in the new zkmetadata. As c1 would previously modified the metadata (C with D), c2 will be thinking the failed bookie is replaced with E and go ahead with unnecessary writes to E. Below is existing code: {code} metadata.setVersion(newMeta.getVersion()); // Resolve the conflicts if zk metadata still contains failed bookie if (newMeta.currentEnsemble.get(ensembleInfo.bookieIndex).equals( ensembleInfo.addr)) { } else { // the failed bookie has been replaced blockAddCompletions.decrementAndGet(); unsetSuccessAndSendWriteRequest(ensembleInfo.bookieIndex); } {code} IMHO, we would add the following checks in the LedgerHandle#resolveConflicts() for "failed bookie has been replaced in ZK": {code} if (newMeta.currentEnsemble.get(ensembleInfo.bookieIndex).equals( ensembleInfo.addr)) { ..... }else if (metadata.currentEnsemble.get(ensembleInfo.bookieIndex) .equals(newMeta.currentEnsemble .get(ensembleInfo.bookieIndex))) { // the failed bookie has been replaced blockAddCompletions.decrementAndGet(); unsetSuccessAndSendWriteRequest(ensembleInfo.bookieIndex); }else { // some other client would have replaced the failed // bookie with new one. Hence failing! ...... } {code} -Rakesh > Ledger recovery will mark ledger as closed with -1, in case of slow bookie is > added to ensemble during recovery add > -------------------------------------------------------------------------------------------------------------------- > > Key: BOOKKEEPER-355 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-355 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-server > Affects Versions: 4.1.0, 4.2.0 > Reporter: Vinay > Assignee: Ivan Kelly > Fix For: 4.2.0 > > Attachments: > 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > BOOKKEEPER-355.patch, BOOKKEEPER-355.patch > > > Scenario: > ------------ > 1. Ledger is created with ensemble and quorum size as 2, written with one > entry > 2. Now first bookie is in the ensemble is made down. > 3. Another client fence and trying to recover the same ledger > 4. During this time ensemble change will happen and new bookie will be added. > But this bookie is not able to connect. > 5. This recovery will fail. > 7. Now previously added bookie came up. > 8. Another client trying to recover the same ledger. > 9. Since new bookie is first in the ensemble, doRecoveryRead() is reading > from that bookie and getting NoSuchLedgerException and closing the ledger > with -1 > i.e. Marking the ledger as empty, even though first client had successfully > written one entry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira