[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533776#comment-13533776
 ] 

Ivan Kelly commented on BOOKKEEPER-447:
---------------------------------------

{quote}I still can't be convinced by NoEntryException solution. As 
NoEntryException is treated as the termination condition for ledger recovery. 
you might end up mixing IOException of a valid entry with NoSuchEntry. In 
honestly, it is difficult to say what caused short read.{quote}
In the latest patch, there are two conditions in which IOException now throws 
NoEntryException instead. The first one is where you try to read the entry 
length, and you don't get a full integer. The second one is where you try to 
read the entry and the number of bytes read is shorter than the number 
requests. Both these are consistent with the case where the entry simply hasn't 
been added to the entrylog. In the case of a bad segment, or FS corruption, the 
FileChannel#read itself would fail with a IOException, so we're not masking 
those errors. If the data on disk is corrupt, the check for the ledger id & 
entry id will fail with IOException so we're not masking that either. Finally, 
if the data is corrupt, the digest check will pick this up on the client side.

{quote}
If you worry about ledger storage and journal are too coupled in this way, why 
not use my first solution converting the order, committing journal first, 
adding entry to ledger storage later{quote}
I actually like this solution most of all, because it hits the core issue. We 
should benchmark to ensure it doesn't hurt performance. Also, I'd make a few 
changes to the patch you provided. You only complete the client callback after 
adding to the ledger storage. I can see the reason for this (client will 
throttle if there's many outstanding ops), but this doesn't do anything for the 
case where many clients are writing a lot. It would be better to start queueing 
requests if the bookie is overwhelmed. This may cause some timeout errors, but 
this is good. It means the clients will start moving off the overwhelmed 
machine, to hopefully less loaded machines.

At the moment, the ledger storage acts as the throttler effectively. If adds 
are going too fast, it will start flushing ledgers and waiting for new pages to 
become available. Changing the order removes that, so perhaps we should 
actually put a limit on the size of Journal#queue. Also, I would make the write 
thread a plain thread and use a queue to push requests to it. It would avoid 
the construction of a lot of Runnable objects.

I'm fine with either the exception based solution or the order swap solution. 
The first one is a smaller change, while the second could have larger side 
effects, so I'm leaning towards the first, but I'm willing to have my mind 
changed.
                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: 
> 0001-BOOKKEEPER-447-EntryLog-throws-NoSuchEntry-on-short-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, 
> 0001-BOOKKEEPER-447-Throw-NoSuchEntry-if-entry-is-not-fou.patch, 
> BOOKKEEPER-447_bitset.diff, BOOKKEEPER-447.diff, 
> BOOKKEEPER-447_force_flush_entry_logger.patch, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to