[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534666#comment-13534666
 ] 

Sijie Guo commented on BOOKKEEPER-447:
--------------------------------------

{quote}
The second one is where you try to read the entry and the number of bytes read 
is shorter than the number requests.
{quote}

if the length field is corrupted with a larger number? maybe it is a corner 
case here. so If you are OK with allowing such case. I am fine with changing 
IOException to NoSuchEntryException, since it is indeed a minor change.
                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: 
> 0001-BOOKKEEPER-447-EntryLog-throws-NoSuchEntry-on-short-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, 
> 0001-BOOKKEEPER-447-Throw-NoSuchEntry-if-entry-is-not-fou.patch, 
> BOOKKEEPER-447_bitset.diff, BOOKKEEPER-447.diff, 
> BOOKKEEPER-447_force_flush_entry_logger.patch, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to