[ 
https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636722#comment-14636722
 ] 

Vikas Saurabh edited comment on OAK-3070 at 7/22/15 11:05 AM:
--------------------------------------------------------------

BTW, even without this optimization, one of test setups reports following stats:
{noformat}
# grep " Version garbage collected in" error.log*
error.log:22.07.2015 02:51:10.195 *INFO* [pool-5-thread-16] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 51.16 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=621383, splitDocGCCount=2448, 
intermediateSplitDocGCCount=265, timeToCollectDeletedDocs=3.496 min, 
timeTakenToDeleteDocs=47.54 min}
error.log.2015-07-15:15.07.2015 02:34:46.895 *INFO* [pool-5-thread-9] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 34.77 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=557486, splitDocGCCount=2364, 
intermediateSplitDocGCCount=257, timeToCollectDeletedDocs=1.735 min, 
timeTakenToDeleteDocs=32.74 min}
error.log.2015-07-16:16.07.2015 02:31:47.559 *INFO* [pool-5-thread-11] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 31.79 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=559512, splitDocGCCount=2302, 
intermediateSplitDocGCCount=249, timeToCollectDeletedDocs=1.124 min, 
timeTakenToDeleteDocs=30.27 min}
error.log.2015-07-17:17.07.2015 02:35:16.096 *INFO* [pool-5-thread-10] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 35.26 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=560226, splitDocGCCount=2351, 
intermediateSplitDocGCCount=257, timeToCollectDeletedDocs=1.537 min, 
timeTakenToDeleteDocs=33.54 min}
error.log.2015-07-18:18.07.2015 02:52:02.542 *INFO* [pool-5-thread-11] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 42.19 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=556254, splitDocGCCount=2304, 
intermediateSplitDocGCCount=245, timeToCollectDeletedDocs=2.341 min, 
timeTakenToDeleteDocs=39.02 min}
error.log.2015-07-19:19.07.2015 02:41:59.809 *INFO* [pool-5-thread-14] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 41.98 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=556036, splitDocGCCount=2285, 
intermediateSplitDocGCCount=252, timeToCollectDeletedDocs=2.637 min, 
timeTakenToDeleteDocs=39.05 min}
error.log.2015-07-20:20.07.2015 02:46:04.002 *INFO* [pool-5-thread-5] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 46.06 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=555721, splitDocGCCount=2320, 
intermediateSplitDocGCCount=249, timeToCollectDeletedDocs=2.836 min, 
timeTakenToDeleteDocs=42.57 min}
error.log.2015-07-21:21.07.2015 02:45:08.122 *INFO* [pool-5-thread-15] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 45.12 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=552140, splitDocGCCount=2267, 
intermediateSplitDocGCCount=245, timeToCollectDeletedDocs=3.238 min, 
timeTakenToDeleteDocs=41.72 min}
{noformat}
Number of docs getting collected everyday is around half million (there aren't 
stats do see how many candidates got fetched). Fetching candidate docs ranged 
from ~1min to ~3.5 minutes.
This setup is a 2 node cluster with Mongo backend using Oak-1.3.2. It doesn't 
have {{oak.mongo.disableVersionGCIndexHint}} set, so index hint to use deleted 
once is being sent.
(cc [~chetanm], [~mreutegg])


was (Author: catholicon):
BTW, even without this optimization, one of test setups reports following stats:
{noformat}
# grep " Version garbage collected in" error.log*
error.log:22.07.2015 02:51:10.195 *INFO* [pool-5-thread-16] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 51.16 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=621383, splitDocGCCount=2448, 
intermediateSplitDocGCCount=265, timeToCollectDeletedDocs=3.496 min, 
timeTakenToDeleteDocs=47.54 min}
error.log.2015-07-15:15.07.2015 02:34:46.895 *INFO* [pool-5-thread-9] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 34.77 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=557486, splitDocGCCount=2364, 
intermediateSplitDocGCCount=257, timeToCollectDeletedDocs=1.735 min, 
timeTakenToDeleteDocs=32.74 min}
error.log.2015-07-16:16.07.2015 02:31:47.559 *INFO* [pool-5-thread-11] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 31.79 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=559512, splitDocGCCount=2302, 
intermediateSplitDocGCCount=249, timeToCollectDeletedDocs=1.124 min, 
timeTakenToDeleteDocs=30.27 min}
error.log.2015-07-17:17.07.2015 02:35:16.096 *INFO* [pool-5-thread-10] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 35.26 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=560226, splitDocGCCount=2351, 
intermediateSplitDocGCCount=257, timeToCollectDeletedDocs=1.537 min, 
timeTakenToDeleteDocs=33.54 min}
error.log.2015-07-18:18.07.2015 02:52:02.542 *INFO* [pool-5-thread-11] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 42.19 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=556254, splitDocGCCount=2304, 
intermediateSplitDocGCCount=245, timeToCollectDeletedDocs=2.341 min, 
timeTakenToDeleteDocs=39.02 min}
error.log.2015-07-19:19.07.2015 02:41:59.809 *INFO* [pool-5-thread-14] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 41.98 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=556036, splitDocGCCount=2285, 
intermediateSplitDocGCCount=252, timeToCollectDeletedDocs=2.637 min, 
timeTakenToDeleteDocs=39.05 min}
error.log.2015-07-20:20.07.2015 02:46:04.002 *INFO* [pool-5-thread-5] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 46.06 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=555721, splitDocGCCount=2320, 
intermediateSplitDocGCCount=249, timeToCollectDeletedDocs=2.836 min, 
timeTakenToDeleteDocs=42.57 min}
error.log.2015-07-21:21.07.2015 02:45:08.122 *INFO* [pool-5-thread-15] 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector Version 
garbage collected in 45.12 min. VersionGCStats{ignoredGCDueToCheckPoint=false, 
deletedDocGCCount=552140, splitDocGCCount=2267, 
intermediateSplitDocGCCount=245, timeToCollectDeletedDocs=3.238 min, 
timeTakenToDeleteDocs=41.72 min}
{noformat}
Number of docs getting collected everyday is around half million (there aren't 
stats do see how many candidates got fetched). Fetching candidate docs ranged 
from ~1min to ~3.5 minutes.
This setup is a 2 node cluster with Mongo backend. It doesn't have 
{{oak.mongo.disableVersionGCIndexHint}} set, so index hint to use deleted once 
is being sent.
(cc [~chetanm], [~mreutegg])

> Use a lower bound in VersionGC query to avoid checking unmodified once 
> deleted docs
> -----------------------------------------------------------------------------------
>
>                 Key: OAK-3070
>                 URL: https://issues.apache.org/jira/browse/OAK-3070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>             Fix For: 1.3.5
>
>         Attachments: OAK-3070.patch
>
>
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run. 
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection 
> and reused for next run



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to