[ https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533776#comment-13533776 ]
Ivan Kelly commented on BOOKKEEPER-447: --------------------------------------- {quote}I still can't be convinced by NoEntryException solution. As NoEntryException is treated as the termination condition for ledger recovery. you might end up mixing IOException of a valid entry with NoSuchEntry. In honestly, it is difficult to say what caused short read.{quote} In the latest patch, there are two conditions in which IOException now throws NoEntryException instead. The first one is where you try to read the entry length, and you don't get a full integer. The second one is where you try to read the entry and the number of bytes read is shorter than the number requests. Both these are consistent with the case where the entry simply hasn't been added to the entrylog. In the case of a bad segment, or FS corruption, the FileChannel#read itself would fail with a IOException, so we're not masking those errors. If the data on disk is corrupt, the check for the ledger id & entry id will fail with IOException so we're not masking that either. Finally, if the data is corrupt, the digest check will pick this up on the client side. {quote} If you worry about ledger storage and journal are too coupled in this way, why not use my first solution converting the order, committing journal first, adding entry to ledger storage later{quote} I actually like this solution most of all, because it hits the core issue. We should benchmark to ensure it doesn't hurt performance. Also, I'd make a few changes to the patch you provided. You only complete the client callback after adding to the ledger storage. I can see the reason for this (client will throttle if there's many outstanding ops), but this doesn't do anything for the case where many clients are writing a lot. It would be better to start queueing requests if the bookie is overwhelmed. This may cause some timeout errors, but this is good. It means the clients will start moving off the overwhelmed machine, to hopefully less loaded machines. At the moment, the ledger storage acts as the throttler effectively. If adds are going too fast, it will start flushing ledgers and waiting for new pages to become available. Changing the order removes that, so perhaps we should actually put a limit on the size of Journal#queue. Also, I would make the write thread a plain thread and use a queue to push requests to it. It would avoid the construction of a lot of Runnable objects. I'm fine with either the exception based solution or the order swap solution. The first one is a smaller change, while the second could have larger side effects, so I'm leaning towards the first, but I'm willing to have my mind changed. > Bookie can fail to recover if index pages flushed before ledger flush > acknowledged > ---------------------------------------------------------------------------------- > > Key: BOOKKEEPER-447 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447 > Project: Bookkeeper > Issue Type: Bug > Components: bookkeeper-server > Affects Versions: 4.2.0 > Reporter: Yixue (Andrew) Zhu > Assignee: Ivan Kelly > Fix For: 4.2.0, 4.1.1 > > Attachments: > 0001-BOOKKEEPER-447-EntryLog-throws-NoSuchEntry-on-short-.patch, > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, > 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, > 0001-BOOKKEEPER-447-Throw-NoSuchEntry-if-entry-is-not-fou.patch, > BOOKKEEPER-447_bitset.diff, BOOKKEEPER-447.diff, > BOOKKEEPER-447_force_flush_entry_logger.patch, perf.png > > > Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file > to reflect unacknowledged entries (due to flushLedger). Suppose ledger and > entry fail to flush due to Bookkeeper server crash, it will cause ledger > recovery not able to use the bookie afterward, due to > InterleavedStorageLedger::getEntry throws IOException. > If the ackSet bookies all experience this problem (DC environment), the > ledger will not be able to recover. > The problem here essentially a violation of WAL. One reasonable fix is to > track ledger flush progress (either per-ledger entry, or per-topic message). > Do not flush index pages which tracks entries whose ledger (log) has not been > flushed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira