[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542213#comment-13542213
 ] 

Ivan Kelly commented on BOOKKEEPER-355:
---------------------------------------

New patch adds timeout.

{quote}
I doubt whether similar situation comes during normal add entries also. Here, 
we are first updating the ledger metadata and then tries writing the entries.
I feel its safe if we could reverse the logic of 
LedgerHandle#handleBookieFailure() like:
1)Replace the new bookie to the metadata.currentEnsemble, now sends a write 
entry.
2)Then update the zkmetadata only after the successful writes of an entry. 
Otherwise finds another bookie..
{quote}
We could change it to do this, but i don't see any compelling reason to, so we 
should leave it as is, as it works now. A similar situation cannot occur with 
normal adds, as a fails bookie will not contain the entry in any case, so 
replacing it is safe.
                
> Ledger recovery will mark ledger as closed with -1, in case of slow bookie is 
> added to ensemble during  recovery add
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-355
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-355
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.1.0, 4.2.0
>            Reporter: Vinay
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: 
> 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, 
> 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, 
> 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, 
> 0001-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, 
> 0003-BOOKKEEPER-355-Ledger-recovery-will-mark-ledger-as-c.patch, 
> BOOKKEEPER-355.patch, BOOKKEEPER-355.patch
>
>
> Scenario:
> ------------
> 1. Ledger is created with ensemble and quorum size as 2, written with one 
> entry
> 2. Now first bookie is in the ensemble is made down.
> 3. Another client fence and trying to recover the same ledger
> 4. During this time ensemble change will happen and new bookie will be added. 
> But this bookie is not able to connect.
> 5. This recovery will fail.
> 7. Now previously added bookie came up.
> 8. Another client trying to recover the same ledger.
> 9. Since new bookie is first in the ensemble, doRecoveryRead() is reading 
> from that bookie and getting NoSuchLedgerException and closing the ledger 
> with -1
> i.e. Marking the ledger as empty, even though first client had successfully 
> written one entry.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to