Csaba Varga created OAK-7209:
--------------------------------

             Summary: Race condition can resurrect blobs during blob GC
                 Key: OAK-7209
                 URL: https://issues.apache.org/jira/browse/OAK-7209
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: blob-plugins
    Affects Versions: 1.6.5
            Reporter: Csaba Varga


A race condition exists between the scheduled blob ID publishing process and 
the GC process that can resurrect the blobs being deleted by the GC. This is 
how it can happen:
 # MarkSweepGarbageCollector.collectGarbage() starts running.
 # As part of the preparation for sweeping, BlobIdTracker.globalMerge() is 
called, which merges all blob ID records from the blob store into the local 
tracker.
 # Sweeping begins deleting files.
 # BlobIdTracker.snapshot() gets called by the scheduler. It pushes all blob ID 
records that were collected and merged in step 2 back into the blob store, then 
deletes the local copies.
 # Sweeping completes and tries to remove the successfully deleted blobs from 
the tracker. Step 4 already deleted those records from the local files, so 
nothing gets removed.

The end result is that all blobs removed during the GC run will be considered 
still alive and causes warnings when later GC runs try to remove them again. 
The risk is higher the longer the sweep runs, but it can happen during a short 
but badly timed GC run as well. (We've found it during a GC run that took more 
than 11 hours to complete.)

I can see two ways to approach this:
 # Suspend the execution of BlobIdTracker.snapshot() while Blob GC is in 
progress. This requires adding new methods to the BlobTracker interface to 
allow suspending and resuming snapshotting of the tracker.
 # Have the two overloads of BlobIdTracker.remove() do a globalMerge() before 
trying to remove anything. This ensures that even if a snapshot() call happened 
during the GC run, all IDs are "pulled back" into the local tracker and can be 
removed successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to