[ 
https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885773#comment-15885773
 ] 

Marcel Reutegger commented on OAK-3070:
---------------------------------------

bq. why is there so many false positives for _deletedOnce

I don't know what those nodes are. Indexes may explain some of those documents. 
All I know right now is that there are repositories out there with increasing 
number of candidate documents that cannot be removed because they represent 
resurrected nodes.

bq. Since _deletedOnce was always immutable, I wonder if we have some hidden 
dragons

This is a good point and we need to be careful with changes in this area. 
However, I think there are benefits that are worth the change. The _deletedOnce 
index will always be within certain bounds. E.g. if Revision GC runs every day, 
it will at most have to index documents that were modified the last 48 hours. 
Older document either get removed or their flag is reset. It may also allow us 
to implement 'Continuous Revision GC' (OAK-3710) by simply 'tailing' a 
_modified+_deletedOnce compound index. OAK-3070 wouldn't be needed in this case.

Regarding backports, I also think we should be careful and not backport 
OAK-5704 immediately. There's an option question around the number of calls 
that it may trigger. The DocumentStore API does not have a bulk update with a 
condition, which means the VersionGarbageCollection current performs individual 
document updates when it resets the _deletedOnce flag.



> Use a lower bound in VersionGC query to avoid checking unmodified once 
> deleted docs
> -----------------------------------------------------------------------------------
>
>                 Key: OAK-3070
>                 URL: https://issues.apache.org/jira/browse/OAK-3070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>              Labels: performance
>         Attachments: OAK-3070.patch, OAK-3070-updated.patch, 
> OAK-3070-updated.patch
>
>
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run. 
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection 
> and reused for next run



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to