Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-07-02 Thread via GitHub


stefan-egli commented on code in PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1654745087


##
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/FullGCStatsCollector.java:
##
@@ -31,6 +32,34 @@ public interface FullGCStatsCollector {
  */
 void documentRead();
 
+/**
+ * Total No. of properties detected as garbage during a given GC phase
+ * @param mode GC phase
+ * @param numProps no. of garbage properties found in current cycle
+ */
+void candidateProperties(GCPhase mode, long numProps);
+
+/**
+ * Total No. of documents detected as garbage during a given GC phase
+ * @param mode GC phase
+ * @param numCommits no. of garbage documents found in current cycle
+ */
+void candidateDocuments(GCPhase mode, long numCommits);

Review Comment:
   What is the use case of this counter? I see it is currently only used in 
`collectUnmergedBranchCommits` hence seems a bit asymmetric. There already is a 
counter for how many documents are read (`documentRead`), is there an advantage 
of having `candidateDocuments` too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


Joscorbe commented on PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186350396

   I will squash this PR into a single commit once approved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


Joscorbe commented on code in PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650867475


##
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/VersionGarbageCollector.java:
##
@@ -240,7 +240,21 @@ static FullGCMode getFullGcMode() {
 AUDIT_LOG.info(" VersionGarbageCollector created with fullGcMode 
= {}", fullGcMode);
 }
 
-public void setStatisticsProvider(StatisticsProvider provider) {
+/**
+ * Please note that at the moment the includes do not
+ * take long paths into account. That is, if a long path was
+ * supposed to be included via an include, it is not.
+ * Reason for this is that long paths would require
+ * the mongo query to include a '_path' condition - which disallows
+ * mongo from using the '_modified_id' index. IOW long paths
+ * would result in full scans - which results in bad performance.
+ */
+void setFullGCPaths(@NotNull Set includes, @NotNull Set 
excludes) {
+this.fullGCIncludePaths = requireNonNull(includes);
+this.fullGCExcludePaths = requireNonNull(excludes);
+}
+
+void setStatisticsProvider(StatisticsProvider provider) {

Review Comment:
   Oh, the public was lost here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


Joscorbe commented on PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186329449

   > actually, there are compilation errors:
   > 
   > ```
   > [INFO] -
   > Error:  COMPILATION ERROR : 
   > [INFO] -
   > Error:  
/home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[367,11]
 setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is 
not public in 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be 
accessed from outside package
   > Error:  
/home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[538,11]
 setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is 
not public in 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be 
accessed from outside package
   > [INFO] 2 errors 
   > ```
   
   I will double-check this, sounds weird, I built it and ran all the 
integration tests locally... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


Joscorbe commented on code in PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650854809


##
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##
@@ -535,8 +544,8 @@ private void collectDocument(RevisionsOptions options, 
Closer closer, String pat
 }
 gc.collectGarbageOnDocument(documentNodeStore, workingDocument, 
options.isVerbose());
 
-//TODO: Probably we should output some details of fullGCStats. Could 
be done after OAK-10378
-//gc.getFullGCStats();
+System.out.println("Full GC Stats:");
+System.out.println(gc.getFullGCStatsReport());

Review Comment:
   Not on purpose, I will add them here too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


stefan-egli commented on PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#issuecomment-2186287106

   actually, there are compilation errors:
   ```
   [INFO] -
   Error:  COMPILATION ERROR : 
   [INFO] -
   Error:  
/home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[367,11]
 setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is 
not public in 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be 
accessed from outside package
   Error:  
/home/runner/work/jackrabbit-oak/jackrabbit-oak/oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:[538,11]
 setStatisticsProvider(org.apache.jackrabbit.oak.stats.StatisticsProvider) is 
not public in 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector; cannot be 
accessed from outside package
   [INFO] 2 errors 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OAK-10748: Improve statistics to collect which type of garbage is sent/deleted [jackrabbit-oak]

2024-06-24 Thread via GitHub


stefan-egli commented on code in PR #1543:
URL: https://github.com/apache/jackrabbit-oak/pull/1543#discussion_r1650820289


##
oak-run/src/main/java/org/apache/jackrabbit/oak/run/RevisionsCommand.java:
##
@@ -535,8 +544,8 @@ private void collectDocument(RevisionsOptions options, 
Closer closer, String pat
 }
 gc.collectGarbageOnDocument(documentNodeStore, workingDocument, 
options.isVerbose());
 
-//TODO: Probably we should output some details of fullGCStats. Could 
be done after OAK-10378
-//gc.getFullGCStats();
+System.out.println("Full GC Stats:");
+System.out.println(gc.getFullGCStatsReport());

Review Comment:
   just curious : above the report is indented with 4 spaces but not here - is 
this on purpose?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org