[ https://issues.apache.org/jira/browse/BOOKKEEPER-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643042#comment-13643042 ]
Hudson commented on BOOKKEEPER-584: ----------------------------------- Integrated in bookkeeper-trunk #188 (See [https://builds.apache.org/job/bookkeeper-trunk/188/]) BOOKKEEPER-584: Data loss when ledger metadata is overwritten (sijie via ivank) (Revision 1476283) Result = SUCCESS ivank : Files : * /zookeeper/bookkeeper/trunk/CHANGES.txt * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieWriteLedgerTest.java * /zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/LedgerCloseTest.java > Data loss when ledger metadata is overwritten > --------------------------------------------- > > Key: BOOKKEEPER-584 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-584 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-client > Affects Versions: 4.2.0 > Reporter: Sijie Guo > Assignee: Sijie Guo > Priority: Critical > Fix For: 4.3.0 > > Attachments: BOOKKEEPER-584.diff, BOOKKEEPER-584.diff, > BOOKKEEPER-584.diff > > > this is an issue introduced when fixing BOOKKEEPER-337. the original > #resolveConflicts logic was removed by just checking state and current > ensemble, which tends to fixing multiple bookies changed in same ensemble. > the issue could be reproduce by a test case in following steps: > 1. Ledger L writing several entries to ensemble A, B, C. > 2. C succeed, B failed with slow responses and A failed with unrecoverable > issue. > 3. L would fail all the pending add ops and close the ledger with lastEntryId > = -1. (since no add operations succeed). > 4. The ownership of this Ledger is released and transferred to other machines > (it is the normal use case for Hedwig). > 5. the new owner tried to open Ledger L and recover the ensemble, suppose A, > B is back to normal at this case. so L is closed with lastEntryId is not -1. > 6. the old owner although closed the ledger, but doesn't blocking the > responses for already failed pending add ops. so failures for B would kick in > some ensemble changes and since the ledger metadata is already changed by new > owner, so it needs to resolve the conflicts and update the ledger metadata > with lastEntryId = -1 again. so we get different lastEntryId at different > time, which cause inconsistency and data loss. > for details of this sequence, a test case could describe it more clearly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira