[ 
https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902691#comment-15902691
 ] 

Marcel Reutegger commented on OAK-3070:
---------------------------------------

Thanks for testing. There's one more thing I'm a bit worried about with the 
current state of the patch and a potential backport to earlier branches. 
Without this lower bound, the VersionGC implementation may currently scan 
through a lot of documents that have the _deletedOnce set and represent 
resurrected nodes. Now, with the patch, this is basically pushed to the 
database. Given the indexes we currently have in e.g. MongoDB (_deletedOnce, 
_modified+_id), neither of them will be efficient and the initial result batch 
may take a long time. Chances are, the query will time out and fail. I think 
this is another reason why OAK-5704 is useful. But then we should probably 
replace the existing index on _deletedOnce anyway with a partial index on 
_deletedOnce+_modified. I'd say this is a requirement for continuous VersionGC 
(OAK-3710) and maybe also the incremental VersionGC (OAK-4780).

For the scope of this issue, I would simply lift the query timeout. Do we have 
any other options?

> Use a lower bound in VersionGC query to avoid checking unmodified once 
> deleted docs
> -----------------------------------------------------------------------------------
>
>                 Key: OAK-3070
>                 URL: https://issues.apache.org/jira/browse/OAK-3070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>              Labels: performance
>         Attachments: OAK-3070-2.patch, OAK-3070.patch, 
> OAK-3070-updated.patch, OAK-3070-updated.patch
>
>
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run. 
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection 
> and reused for next run



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to