[ https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875870#comment-15875870 ]
Julian Reschke commented on OAK-4780: ------------------------------------- Here's an approach that might be simpler but in the end achieves the same goal: - set a limit for the collection phase, both for elapsed time and # of documents - when limit reached, sort the collected IDs by modified date, and compute a new upper limit so that half of the documents become out of range; throw these entries as well - continue the collection with the smaller time window (this just needs an internal API that allows to specify the _id to start with) - compute new limit for elapsed time (half of the original?) Eventually, we should have a set of documents that we *can* garbage collect. Finally, if maintenance window still open, just rerun the GC again. > VersionGarbageCollector should be able to run incrementally > ----------------------------------------------------------- > > Key: OAK-4780 > URL: https://issues.apache.org/jira/browse/OAK-4780 > Project: Jackrabbit Oak > Issue Type: Task > Components: core, documentmk > Reporter: Julian Reschke > Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff > > > Right now, the documentmk's version garbage collection runs in several phases. > It first collects the paths of candidate nodes, and only once this has been > successfully finished, starts actually deleting nodes. > This can be a problem when the regularly scheduled garbage collection is > interrupted during the path collection phase, maybe due to other maintenance > tasks. On the next run, the number of paths to be collected will be even > bigger, thus making it even more likely to fail. > We should think about a change in the logic that would allow the GC to run in > chunks; maybe by partitioning the path space by top level directory. -- This message was sent by Atlassian JIRA (v6.3.15#6346)