[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509293#comment-13509293
 ] 

Flavio Junqueira commented on BOOKKEEPER-447:
---------------------------------------------

I have a few comments and questions here:

* I'm not sure what this is doing:

{code}
+                        cleanPages.tryAcquire(100, TimeUnit.MILLISECONDS);
{code}

if the 100 ms elapses and there has been no release, then it means that there 
is no clean page, no?

* Should we rename BookieTest to something less general, like 
BookieLedgerIndexTest?
* Check the javadoc for testIndexPageEviction(), it has typos.
* There is this comment in this same test:

{noformat}
// don't start the bookie, this way sync thread wont run
{noformat}

but the code does start the bookie right after. Is this correct? Is it 
referring to second occurrence of "new Bookie"? 

                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Ivan Kelly
>              Labels: patch
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, 
> BOOKKEEPER-447.diff, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to