[ https://issues.apache.org/jira/browse/BOOKKEEPER-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541136#comment-13541136 ]
Rakesh R commented on BOOKKEEPER-355: ------------------------------------- Hi Ivan, patch looks nice. - Could you please add '@Test(timeout = )' for the test cases. - I doubt whether similar situation comes during normal add entries also. Here, we are first updating the ledger metadata and then tries writing the entries. I feel its safe if we could reverse the logic of LedgerHandle#handleBookieFailure() like: 1)Replace the new bookie to the metadata.currentEnsemble, now sends a write entry. 2)Then update the zkmetadata only after the successful writes of an entry. Otherwise finds another bookie.. How does it sound to you? > Ledger recovery will mark ledger as closed with -1, in case of slow bookie is > added to ensemble during recovery add > -------------------------------------------------------------------------------------------------------------------- > > Key: BOOKKEEPER-355 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-355 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-server > Affects Versions: 4.1.0, 4.2.0 > Reporter: Vinay > Assignee: Ivan Kelly > Fix For: 4.2.0 > > Attachments: > 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > 0003-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, > BOOKKEEPER-355.patch, BOOKKEEPER-355.patch > > > Scenario: > ------------ > 1. Ledger is created with ensemble and quorum size as 2, written with one > entry > 2. Now first bookie is in the ensemble is made down. > 3. Another client fence and trying to recover the same ledger > 4. During this time ensemble change will happen and new bookie will be added. > But this bookie is not able to connect. > 5. This recovery will fail. > 7. Now previously added bookie came up. > 8. Another client trying to recover the same ledger. > 9. Since new bookie is first in the ensemble, doRecoveryRead() is reading > from that bookie and getting NoSuchLedgerException and closing the ledger > with -1 > i.e. Marking the ledger as empty, even though first client had successfully > written one entry. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira