[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547100#comment-13547100
 ] 

Rakesh R commented on BOOKKEEPER-530:
-------------------------------------

bq.This is what we do. The code adds to the journal, and when the journal 
callback triggers it adds to the entrylogger and index. If the journal succeeds 
and the entrylogger or index fails, this is fine due to the order, as the entry 
remains in the same place until we write the index, which is the last thing.

Oh! seems I hadn't explained clearly my idea. Could you please look at the 
below snippet of CompactionScanner#process() api. Here just tries to make 
await() logic simple and fail fast rather than waiting to finish all the 
entries in case of any failures.
{code}
        @Override
        public void process(final long ledgerId, long offset, ByteBuffer entry)
            throws IOException {
            final CountDownLatch addEntryNotificationLatch = new 
CountDownLatch(1);
            safeEntryAdder.safeAddEntry(ledgerId, entry, new 
GenericCallback<Void>() {
                    @Override
                    public void operationComplete(int rc, Void result) {
                        if (rc != BookieException.Code.OK) {
                            LOG.error("Error {} re-adding entry for ledger {})",
                                    rc, ledgerId);
                            allSuccessful.set(false);
                        }
                        addEntryNotificationLatch.countDown();
                    }
                });
            awaitComplete(addEntryNotificationLatch);
        }

        private void awaitComplete(CountDownLatch addEntryNotificationLatch) 
throws IOException {
            try {
                addEntryNotificationLatch.await();
                if (allSuccessful.get() == false) {
                    throw new IOException("Couldn't re-add all entries");
                }
            } catch (InterruptedException ie) {
                Thread.currentThread().interrupt();
                LOG.error("Interrupted while compacting", ie);
                throw new IOException("Couldn't re-add all entries", ie);
            }        
       }
{code}

-Rakesh
                
> data might be lost during compaction.
> -------------------------------------
>
>                 Key: BOOKKEEPER-530
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-530
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.1.0
>            Reporter: Sijie Guo
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0
>
>         Attachments: 
> 0001-BOOKKEEPER-530-data-might-be-lost-during-compaction.patch, 
> 0001-BOOKKEEPER-530-data-might-be-lost-during-compaction.patch, 
> 0001-BOOKKEEPER-530-data-might-be-lost-during-compaction.patch
>
>
> {code}
>         try {
>             entryLogger.scanEntryLog(entryLogId, new 
> CompactionScanner(entryLogMeta));
>             // after moving entries to new entry log, remove this old one
>             removeEntryLog(entryLogId);
>         } catch (IOException e) {
>             LOG.info("Premature exception when compacting " + entryLogId, e); 
>         } finally {
>             // clear compacting flag
>             compacting.set(false);
>         }
> {code}
> currently compaction code has a bit problem: as the code described above, old 
> entry log is removed after new entries are added to new entry log, but new 
> entry log might not be flushed. if failures happened after removal but before 
> flush, data would be lost.
> when I implemented compaction feature in BOOKKEEPER-160, I remembered that I 
> took care of letting entry go back to normal addEntry flow to reflect journal 
> and index. But seems that the addEntry doesn't go thru journal, just move 
> entries between entry log files w/o any flush guarantee.
> there are two ideas for this solution:
> simple one is to let compaction going to normal addEntry flow (adding entry 
> to ledger storage and putting it in journal). the other one is GC thread 
> either wait for ledger storage to flush in sync thread in one flush interval 
> or force a ledger storage flush before removing entry log files.
> BTW, it was hard to design a test case by simulating bookie abnormally shut 
> down itself after entry log files are removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to