Csaba Varga created OAK-6596:
--------------------------------

             Summary: Blob store consistency check can show bogus errors about 
missing blobs
                 Key: OAK-6596
                 URL: https://issues.apache.org/jira/browse/OAK-6596
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: blob-plugins
    Affects Versions: 1.7.6
            Reporter: Csaba Varga
            Priority: Minor


While running a blob store consistency check on an AEM 6.3 instance, I've 
noticed a large amount of blobs reported missing. When I checked the list of 
the missing blobs, I couldn't find any that was actually missing. After 
investigating the Oak sources, I could narrow the issue down to the code that's 
sorting the marked blob references. It's supposed to remove duplicates while 
sorting, but it doesn't remove all of them in some circumstances. The 
duplicates then show up as bogus missing blobs since they aren't matched by an 
equal number of duplicate lines in the "available blobs" list.

As far as I can tell, the bug only manifests on installations that use 
DocumentNodeStore (causing the marked blob IDs to also contain the referencing 
node) and only if the number of blobs in the blob store reaches a certain 
threshold (causing the sort code to sort in chunks and merge, instead of 
sorting everything in memory at once). This means it's not easy to reproduce in 
a development environment where you only have dummy content.

I'll attach to this ticket a proposed patch that contains a fix and a test case 
that verifies the correct merge logic. Please let me know if you also need 
reproduction steps to work on this, but I'd rather not do it because the only 
place I can reproduce this has a blobstore over 1TB in size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to