[jira] [Commented] (OAK-4780) VersionGarbageCollector should be able to run incrementally

Stefan Eissing (JIRA) Thu, 02 Feb 2017 03:41:11 -0800

    [ 
https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849819#comment-15849819
 ]


Stefan Eissing commented on OAK-4780:
-------------------------------------

+1 for OAK-5571

In regards to resilience, the VGC needs improvements when it
# encounters and needs to cope with a huge amounts of delete candidates
# needs to work "during the day" where other vital task also need capacity

One such improvement was already discussed, namely tagging the repository with 
the timestamp of the last successful run and only collecting delete candidates 
in that time interval.

Another idea is to adjust the collected time interval so that the amount of 
collected nodes can be kept in check. This can be either done:
# statically, by setting a max time interval to clean. Example: max is 1.5 
days, last clean was 5 days ago, so collect only nodes modified between -5 days 
and -3.5 days.
# dynamically, store the proposed time interval for collection in the 
repository. Collect nodes in that interval since the last cleanup. On reaching 
a threshold, e.g. 100.000, abort collecting, shrink the time interval and try 
again. When it runs through, process the next interval. Compare the amount 
collected each time and grow the time interval if there is room.

Another idea would be to sleep() in between delete batches, in order to avoid 
spamming the database.

> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
>
>
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been 
> successfully finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is 
> interrupted during the path collection phase, maybe due to other maintenance 
> tasks. On the next run, the number of paths to be collected will be even 
> bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in 
> chunks; maybe by partitioning the path space by top level directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (OAK-4780) VersionGarbageCollector should be able to run incrementally

Reply via email to