[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643042#comment-13643042
 ] 

Hudson commented on BOOKKEEPER-584:
-----------------------------------

Integrated in bookkeeper-trunk #188 (See 
[https://builds.apache.org/job/bookkeeper-trunk/188/])
    BOOKKEEPER-584: Data loss when ledger metadata is overwritten (sijie via 
ivank) (Revision 1476283)

     Result = SUCCESS
ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* 
/zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java
* 
/zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadata.java
* 
/zookeeper/bookkeeper/trunk/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java
* 
/zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookieWriteLedgerTest.java
* 
/zookeeper/bookkeeper/trunk/bookkeeper-server/src/test/java/org/apache/bookkeeper/client/LedgerCloseTest.java

                
> Data loss when ledger metadata is overwritten
> ---------------------------------------------
>
>                 Key: BOOKKEEPER-584
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-584
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.2.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>            Priority: Critical
>             Fix For: 4.3.0
>
>         Attachments: BOOKKEEPER-584.diff, BOOKKEEPER-584.diff, 
> BOOKKEEPER-584.diff
>
>
> this is an issue introduced when fixing BOOKKEEPER-337. the original 
> #resolveConflicts logic was removed by just checking state and current 
> ensemble, which tends to fixing multiple bookies changed in same ensemble.
> the issue could be reproduce by a test case in following steps:
> 1. Ledger L writing several entries to ensemble A, B, C.
> 2. C succeed, B failed with slow responses and A failed with unrecoverable 
> issue.
> 3. L would fail all the pending add ops and close the ledger with lastEntryId 
> = -1. (since no add operations succeed).
> 4. The ownership of this Ledger is released and transferred to other machines 
> (it is the normal use case for Hedwig).
> 5. the new owner tried to open Ledger L and recover the ensemble, suppose A, 
> B is back to normal at this case. so L is closed with lastEntryId is not -1.
> 6. the old owner although closed the ledger, but doesn't blocking the 
> responses for already failed pending add ops. so failures for B would kick in 
> some ensemble changes and since the ledger metadata is already changed by new 
> owner, so it needs to resolve the conflicts and update the ledger metadata 
> with lastEntryId = -1 again. so we get different lastEntryId at different 
> time, which cause inconsistency and data loss.
> for details of this sequence, a test case could describe it more clearly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to