[ 
https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875897#comment-15875897
 ] 

Stefan Eissing commented on OAK-4780:
-------------------------------------

[~reschke] any estimation how long that "eventually" will take on a never 
cleaned up 140TB cluster? And what would the GC do when this strategy does not 
lead to success during a maintenance interval?

Measurements from large clusters showed that the collect phase of a GC can take 
as long as 4 hours - only processing changes from the last day. How would this 
work? Let's say there are 10 million node candidates daily. How would you 
configure the limit to operate in this environment? How would the same setting 
work in a cluster never cleaned up (worst case, I know)?

> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core, documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
>
>
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been 
> successfully finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is 
> interrupted during the path collection phase, maybe due to other maintenance 
> tasks. On the next run, the number of paths to be collected will be even 
> bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in 
> chunks; maybe by partitioning the path space by top level directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to