Csaba Varga created OAK-6596:
--------------------------------
Summary: Blob store consistency check can show bogus errors about
missing blobs
Key: OAK-6596
URL: https://issues.apache.org/jira/browse/OAK-6596
Project: Jackrabbit Oak
Issue Type: Bug
Components: blob-plugins
Affects Versions: 1.7.6
Reporter: Csaba Varga
Priority: Minor
While running a blob store consistency check on an AEM 6.3 instance, I've
noticed a large amount of blobs reported missing. When I checked the list of
the missing blobs, I couldn't find any that was actually missing. After
investigating the Oak sources, I could narrow the issue down to the code that's
sorting the marked blob references. It's supposed to remove duplicates while
sorting, but it doesn't remove all of them in some circumstances. The
duplicates then show up as bogus missing blobs since they aren't matched by an
equal number of duplicate lines in the "available blobs" list.
As far as I can tell, the bug only manifests on installations that use
DocumentNodeStore (causing the marked blob IDs to also contain the referencing
node) and only if the number of blobs in the blob store reaches a certain
threshold (causing the sort code to sort in chunks and merge, instead of
sorting everything in memory at once). This means it's not easy to reproduce in
a development environment where you only have dummy content.
I'll attach to this ticket a proposed patch that contains a fix and a test case
that verifies the correct merge logic. Please let me know if you also need
reproduction steps to work on this, but I'd rather not do it because the only
place I can reproduce this has a blobstore over 1TB in size.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)