[ https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531540#comment-13531540 ]
Ivan Kelly commented on BOOKKEEPER-447: --------------------------------------- Actually, even requesting a flush of just the entrylog and then flushing the ledger page doesn't guarantee anything, as we don't use double buffering. It's possible that we could flush the entrylog, an entry is added to the ledger page, and then we flush the ledger page. Unless we put a big lock around everything which is ugly as hell and couples this stuff in a way I really dont like. How about another approach (not sure if it's been suggested before), that if we find an entry in the index and then we can't find it in the entrylog, we throw an exception the same way as if we never found it in the index? > Bookie can fail to recover if index pages flushed before ledger flush > acknowledged > ---------------------------------------------------------------------------------- > > Key: BOOKKEEPER-447 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-server > Affects Versions: 4.2.0 > Reporter: Yixue (Andrew) Zhu > Assignee: Ivan Kelly > Fix For: 4.2.0, 4.1.1 > > Attachments: > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, > BOOKKEEPER-447.diff, perf.png > > > Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file > to reflect unacknowledged entries (due to flushLedger). Suppose ledger and > entry fail to flush due to Bookkeeper server crash, it will cause ledger > recovery not able to use the bookie afterward, due to > InterleavedStorageLedger::getEntry throws IOException. > If the ackSet bookies all experience this problem (DC environment), the > ledger will not be able to recover. > The problem here essentially a violation of WAL. One reasonable fix is to > track ledger flush progress (either per-ledger entry, or per-topic message). > Do not flush index pages which tracks entries whose ledger (log) has not been > flushed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira