Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe merged PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2285775281 > +1, I don't have full overview of how many metrics this will introduce (i.e. whether that number might be excessive), but I think that's something that we'll see and can handle downstream, in case. I have added a proper description to the PR. Thanks for double checking, will merge this end of the day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on code in PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1713768321 ## oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java: ## @@ -1316,6 +1316,7 @@ private void collectDeletedProperties(final NodeDocument doc, final GCPhases pha .sum(); deletedPropsCountMap.put(doc.getId(), deletedPropsCount); + fullGCStats.collectedPropertiesDeleted(GCPhase.FULL_GC_COLLECT_PROPS, deletedPropsCount); Review Comment: I have changed the names of the statistics to clarify it is counting the candidates to delete. Those numbers can then mismatch the actually deleted statistics, since they could be skipped in later stages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
stefan-egli commented on code in PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1654745087 ## oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGCStatsCollector.java: ## @@ -31,6 +32,34 @@ public interface FullGCStatsCollector { */ void documentRead(); +/** + * Total No. of properties detected as garbage during a given GC phase + * @param mode GC phase + * @param numProps no. of garbage properties found in current cycle + */ +void candidateProperties(GCPhase mode, long numProps); + +/** + * Total No. of documents detected as garbage during a given GC phase + * @param mode GC phase + * @param numCommits no. of garbage documents found in current cycle + */ +void candidateDocuments(GCPhase mode, long numCommits); Review Comment: What is the use case of this counter? I see it is currently only used in `collectUnmergedBranchCommits` hence seems a bit asymmetric. There already is a counter for how many documents are read (`documentRead`), is there an advantage of having `candidateDocuments` too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186350396 I will squash this PR into a single commit once approved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on code in PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650867475 ## oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java: ## @@ -240,7 +240,21 @@ static FullGCMode getFullGcMode() { AUDIT_LOG.info(" VersionGarbageCollector created with fullGcMode = {}", fullGcMode); } -public void setStatisticsProvider(StatisticsProvider provider) { +/** + * Please note that at the moment the includes do not + * take long paths into account. That is, if a long path was + * supposed to be included via an include, it is not. + * Reason for this is that long paths would require + * the mongo query to include a '_path' condition - which disallows + * mongo from using the '_modified_id' index. IOW long paths + * would result in full scans - which results in bad performance. + */ +void setFullGCPaths(@NotNull Set includes, @NotNull Set excludes) { +this.fullGCIncludePaths = requireNonNull(includes); +this.fullGCExcludePaths = requireNonNull(excludes); +} + +void setStatisticsProvider(StatisticsProvider provider) { Review Comment: Oh, the public was lost here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186329449 > actually, there are compilation errors: > > ``` > [INFO] - > Error: COMPILATION ERROR : > [INFO] - > Error: /home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[367,11] setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is not public in org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be accessed from outside package > Error: /home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[538,11] setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is not public in org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be accessed from outside package > [INFO] 2 errors > ``` I will double-check this, sounds weird, I built it and ran all the integration tests locally... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
Joscorbe commented on code in PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650854809 ## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ## @@ -535,8 +544,8 @@ private void collectDocument(RevisionsOptions options, Closer closer, String pat } gc.collectGarbageOnDocument(documentNodeStore, workingDocument, options.isVerbose()); -//TODO: Probably we should output some details of fullGCStats. Could be done after OAK-10378 -//gc.getFullGCStats(); +System.out.println("Full GC Stats:"); +System.out.println(gc.getFullGCStatsReport()); Review Comment: Not on purpose, I will add them here too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
stefan-egli commented on PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186287106 actually, there are compilation errors: ``` [INFO] - Error: COMPILATION ERROR : [INFO] - Error: /home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[367,11] setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is not public in org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be accessed from outside package Error: /home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[538,11] setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is not public in org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be accessed from outside package [INFO] 2 errors ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]
stefan-egli commented on code in PR #1543: URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650820289 ## oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java: ## @@ -535,8 +544,8 @@ private void collectDocument(RevisionsOptions options, Closer closer, String pat } gc.collectGarbageOnDocument(documentNodeStore, workingDocument, options.isVerbose()); -//TODO: Probably we should output some details of fullGCStats. Could be done after OAK-10378 -//gc.getFullGCStats(); +System.out.println("Full GC Stats:"); +System.out.println(gc.getFullGCStatsReport()); Review Comment: just curious : above the report is indented with 4 spaces but not here - is this on purpose? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org