[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234262#comment-13234262
 ] 

Ivan Kelly commented on BOOKKEEPER-112:
---------------------------------------


Flavio and I discussed this a little yesterday evening and after thinking about 
it for a little bit afterwards, the problem seems clearer to me now.

So, what is under discussion here is what do we do with the final fragment of 
an open ledger. This actually boils down to the same problem we have for 
fencing. By recovering the bookie, we are introducing a second writer, 
violating our 1-writer assumption. Since we now have more than one writer, it 
is necessary for there to be a consensus among all writers on where the ledger 
fragment ends. 

There are 3 situations which this open final fragment can occur.
 # the original writer crashed, then bookie crashed before ledger recovery
 # the original writer has the ledger open, but has not written anything since 
the bookie crashed
 # the bookie being recovered isn't actually down

One solution proposed by Flavio yesterday was that we should wait until no open 
final fragments exist before updating the ZK metadata. This works for 2. 
However, for 1 & 3, the recovery will wait forever.

One way im leaning towards now, is to replicate all entries in the fragment, 
and then ensure that no more entries are added to this specific fragment. This 
would require a change to how fencing works. Instead of fencing by ledger id, 
we would have to fence by fragment id. When the original writer tries to write, 
the write will fail, and then try to replace the bookie to which the write 
failed to write to (all bookies in this case). This deals with 1, because all 
entries written before the writer crash will be replicated. It works for 2, 
because the next write by the writer will see that its current ledger fragment 
is fenced *and* that the crashed bookie is down, so it will build a new 
ensemble and start writing a new fragment. It deals with 3, as the current 
fragment will be rereplicated and any further attempts by the writer will force 
it to rebuild its ensemble.
                
> Bookie Recovery on an open ledger will cause LedgerHandle#close on that 
> ledger to fail
> --------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-112
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-112
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Flavio Junqueira
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BK-112.patch, BOOKKEEPER-112.patch, 
> BOOKKEEPER-112.patch_v2, BOOKKEEPER-112.patch_v3, BOOKKEEPER-112.patch_v4, 
> BOOKKEEPER-112.patch_v5
>
>
> Bookie recovery updates the ledger metadata in zookeeper. LedgerHandle will 
> not get notified of this update, so it will try to write out its own ledger 
> metadata, only to fail with KeeperException.BadVersion. This effectively 
> fences all write operations on the LedgerHandle (close and addEntry). close 
> will fail for obvious reasons. addEntry will fail once it gets to the failed 
> bookie in the schedule, tries to write, fails, selects a new bookie and tries 
> to update ledger metadata.
> Update Line 605, testSyncBookieRecoveryToRandomBookiesCheckForDupes(), when 
> done
> Also, uncomment addEntry in 
> TestFencing#testFencingInteractionWithBookieRecovery()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to