[ https://issues.apache.org/jira/browse/BOOKKEEPER-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632590#comment-13632590 ]
Sijie Guo commented on BOOKKEEPER-572: -------------------------------------- {quote} Assuming we write to the journal/WAL first, there is the problem you mention that the journal entry is corrupt. To get around that problem, you're suggesting a copy-on-write scheme. {quote} no, I am suggesting COW instead of first writing to WAL. the purpose writing to WAL first is to avoid an index page is flushed before entry log entries (w/o journal entries be persisted) when stealing an index page from ledger cache. {quote} Consequently, I think you're proposing that when we recover (apply or re-apply journal records), we don't actually apply changes in place, and instead we write them separately and swap the new data with the old data. {quote} yes, as you described. let me recap my words in this jira on how cow prevent the issue in BOOKKEEPER-447. 1) at time T, the ledger index page is P. 2) ledger index page P is updated to P', which contains new entries E1. 3) E1 contains pos to the entry in entry logger. 4) ledger cache is full, a different ledger update will cause the index page P' evicted and flushed back to filesystem. in update-in-place solution (we used now in bookie server), the P' would overwrite P, but its entry in entry logger and journal file are lost due to crash. so ledger index P' would contain stale pointer. in COW solution, in 4) step, we don't overwrite P. so there are P' and P page. P' is made as permanent available is only when SyncThread#flush (which would persist ledger index entries and entry logger entries). so if the bookie crash before persisting journal entries and ledger entries, P' would be discarded when recovery and the state goes back to P, without containing stale index entries. so, in a word, for your question 1), I don't think we really need write WAL first (since it still has the problem I raised in previous comment). question 2), COW is an alternative to address BOOKKEEPER-447, not address the buggy record issue for writing WAL first. > Make the journal a write ahead log > ---------------------------------- > > Key: BOOKKEEPER-572 > URL: https://issues.apache.org/jira/browse/BOOKKEEPER-572 > Project: Bookkeeper > Issue Type: Bug > Reporter: Ivan Kelly > Assignee: Ivan Kelly > Fix For: 4.3.0 > > Attachments: > 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > 0001-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > 0003-BOOKKEEPER-572-Write-to-the-journal-before-writing-t.patch, > BookieServer-2013-02-22.snapshot > > > A bookie adds to the LedgerStorage before writing to the journal. This is the > fundamental problem behind BOOKKEEPER-447 and blocks a nice solution to > BOOKKEEPER-530. By writing to the memory state before the journal, we exposed > ourselves to bugs if the bookie crashed before we wrote to the journal. The > entry may exist in index, but not in the entrylog, a situation which cannot > be distinguished from an I/O error. The comments on BOOKKEEPER-447 goes into > more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira