[ https://issues.apache.org/jira/browse/OAK-8170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amit Jain resolved OAK-8170. ---------------------------- Resolution: Not A Bug [~wim.symons] That's how its supposed to work. The revision garbage collection or segment store compaction (online or offline) are a prerequisite to running the DataStore maintenance operations like DSGC, consistency check. This is also mentioned in the Blob Garbge Collection section of the doc [1] {quote}Mark NodeStore - GC logic would make a record of all the blob references which are referred by any node present in NodeStore. Note that any blob references from old revisions of node would also be considered as a valid references. {quote} What's causing confusion is the usage of --verbose operation. This option is generally meant to be used when there's need to identify nodes referencing blobs, specifically in the case when there are blobs needed that have been deleted for some reason and need to be restored. Since, it iterates over nodes only reachable from the head state and that is why it is able to not report as yet uncollected nodes. I definitely think there's for improvement in the documentation to clear this confusion. Could you please create a doc issue where any lacuna can be addressed. [1] [https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html] > oak-run datastorecheck and online consistency check falsely report missing > blobs > -------------------------------------------------------------------------------- > > Key: OAK-8170 > URL: https://issues.apache.org/jira/browse/OAK-8170 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar > Affects Versions: 1.8.9 > Reporter: Wim Symons > Priority: Major > Attachments: output.txt > > > Hi, > We found that oak-run datastorecheck falsely reports missing blobs when > running datastorecheck without the --verbose option. > Even the online datastore consistency check falsely reports the same missing > blobs. > This is related due to the fact that the standard blob reference collector in > oak-run datastorecheck looks at *all* compaction generations in the segment > store instead of only the last one. > After running an offline compaction, and thus keeping only 1 generation, the > correct number of blob references and missing blobs is reported by oak-run > datastorecheck. > The bug on the 1.8 branch comes from > org.apache.jackrabbit.oak.plugins.blob.BlobReferenceRetriever#collectReferences > (line 429) and by following that you arrive at > org.apache.jackrabbit.oak.segment.file.FileStore#tarFiles (line 1013) stating: > tarFiles.collectBlobReferences(collector, > newOldReclaimer(lastCompactionType, getGcGeneration(), > gcOptions.getRetainedGenerations())); > I'm not familiar enough with this source code, so I won't attempt adding a > patch. > I did double-check trunk and saw the same line of code there: > org.apache.jackrabbit.oak.segment.file.GarbageCollector#collectBlobReferences > (line 324). > I attached a text file with the outputs of the commands I ran. > We currently use Oak 1.8.9 using AEM 6.4.3.0 and oak-blob-cloud 1.8.9 from > the 1.8.3 AEM S3 connector. > Regards > Wim -- This message was sent by Atlassian JIRA (v7.6.3#76005)