athanatos opened a new pull request #1768: ISSUE-1757: prevent race between 
flush and delete from recreating index
URL: https://github.com/apache/bookkeeper/pull/1768
 
 
   IndexPersistencManager.flushLedgerHandle can race with delete by
   obtaining a FileInfo just prior to delete and then proceeding to rewrite
   the file info resurrecting it.  FileInfo provides a convenient point of
   synchronization for avoiding this race.  FileInfo.moveLedgerIndexFile,
   FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so
   this patch simply adds a deleted flag to the FileInfo object to indicate
   that the FileInfo has become invalid.  checkOpen is called in every
   method and will now throw FileInfoDeleted if delete has been called.
   IndexPersistenceManager can catch it and deal with it appropriately in
   flush (which generally means moving onto the next ledger).
   
   This patch also eliminates ledgersToFlush and ledgersFlushing.  Their
   purpose appears to be to allow delete to avoid flushing a ledger which
   has been selected for flushing but not flushed yet avoiding the above
   race.  It's not sufficient, however, because IndexInMemPageMgr calls
   IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo
   for the ledger prior to the deletion and then call
   relocateIndexFileAndFlushHeader afterwards.
   
   Also, if the purpose was to avoid concurrent calls into
   flushSpecificLedger on the same ledger, it's wrong because of the
   following sequence:
   
   t0: thread 0 calls flushOneOrMoreLedgers
   t1: thread 0 place ledger 10 into ledgersFlushing and completes
   flushSpecificLedger
   t2: thread 2 performs a write to ledger 10
   t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10
   t4: thread 0 releases ledger 10 from ledgersFlushing
   t5: thread 1 completes flushOneOrMoreLedgers
   
   Although thread 1 begins to flush after the write to ledger 10, it won't
   capture the write rendering the flush incorrect.  I don't think it's
   actually worth avoiding overlapping flushes here because both FileInfo
   and LedgerEntryPage track dirty state.  As such, it seems simpler to
   just get rid of them.
   
   This patch also Adds a more aggressive version of testFlushDeleteRace to
   test the new behavior.  Testing with multiple flushers turned up a bug
   with LedgerEntryPage.getPageToWrite where didn't return a buffer with
   independent read pointers, so this patch addresses that as well.
   
   (@bug W-5549455@)
   (@rev cguttapalem@)
   Signed-off-by: Samuel Just <sj...@salesforce.com>
   (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to