[ 
https://issues.apache.org/jira/browse/OAK-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663446#comment-16663446
 ] 

Amit Jain commented on OAK-7859:
--------------------------------

The logs doesn't contain anything interesting, specifically its missing the 
logs covered by [1] which I was interested in. Have these also been removed? 
And if not I am not sure why these are not present, there are INFO level logs 
as well. Also, I don't see the default INFO level logs from 
{{org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector}}.

[1]https://github.com/apache/jackrabbit-oak/blob/1.6/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/blob/datastore/BlobIdTracker.java#L233-L248

> S3 Bucket iterator stops too early
> ----------------------------------
>
>                 Key: OAK-7859
>                 URL: https://issues.apache.org/jira/browse/OAK-7859
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud
>    Affects Versions: 1.6.6
>            Reporter: Wim Symons
>            Assignee: Amit Jain
>            Priority: Critical
>              Labels: newbie, pull-request-available
>         Attachments: META-ids.txt, dsgc.log, meta-info.txt
>
>
> Fixed a major bug in the S3 bucket iterator.
> When the returned queue of records is empty due to the fact that we get a 
> full page of records starting with the META/ key, the iterator stops while 
> there is still data available in the bucket.
> This causes problems with datastore GC, and datastore consistency checks 
> (both online and offline), and possibly even more.
> A little explainer. But based on a batch size of 2 instead of 1000.
> Suppose your list of S3 keys looks as follows:
>  * 1
>  * 2
>  * 3
>  * 4
>  * META/1
>  * META/2
>  * 5
>  * 6
> loadBatch would first load [1, 2], filter out no META/ keys and pass [1, 2] 
> to the caller.
> Next time, loadBatch would load [3, 4], filter out no META/ keys and pass [3, 
> 4] to the caller.
> Than, loadBatch would load [META/1, META/2], filter out the META/ keys and 
> pass [] to the caller.
> When that happens, traversing the bucket would stop, because the returned 
> list is empty, even if there are many more batches to load.
> The fix checks if the returned list is empty and there are more batches 
> available, it would load (a) new batch(es) until there is data in the batch 
> or there is no more batch available.
> We are currently running Oak 1.6.6 on AEM 6.3.1.2, but as the bug is still in 
> trunk, all previous versions of Oak are affected as well.
> I provided 2 pull requests: one for trunk 
> ([https://github.com/apache/jackrabbit-oak/pull/103)] and one for the 1.6 
> branch ([https://github.com/apache/jackrabbit-oak/pull/104).]
> CI failed on [https://github.com/apache/jackrabbit-oak/pull/103,] but I don't 
> think it's related to my changes.
> For the record, the patch works as I was able to successfully test this on 
> our production repository using oak-run --id. With version 1.6.6 it reported 
> 800k items, with my patched version, it reported 1.8m items. (As our META/ 
> nodes are listed somewhere half-way through.)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to