[ https://issues.apache.org/jira/browse/OAK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996622#comment-14996622 ]
Alex Parvulescu commented on OAK-3603: -------------------------------------- this depends on a stable set or referenced ids, so we'd need to fix OAK-3602 first. > Evaluate skipping cleanup of a subset of tar files > -------------------------------------------------- > > Key: OAK-3603 > URL: https://issues.apache.org/jira/browse/OAK-3603 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: segmentmk > Reporter: Alex Parvulescu > Assignee: Alex Parvulescu > > Given the fact that tar readers are immutable (we only create new generations > of them once they reach a certain threshold of garbage) we can consider > coming up with a heuristic for skipping cleanup entirely for consequent > cleanup calls based on the same referenced id set (provided we can make this > set more stable, aka. OAK-2849). > Ex: for a specific input set a cleanup call on a tar reader might decide that > there's no enough garbage (some IO involved in reading through all existing > entries). if the following cleanup cycle would have the exact same input, it > doesn't make sense to recheck the tar file, we already know cleanup can be > skipped, moreover we can skip the older tar files too, as their input would > also not change. the gains increase the larger the number of tar files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)