stefan-egli commented on code in PR #1314:
URL: https://github.com/apache/jackrabbit-oak/pull/1314#discussion_r1497822015


##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +277,61 @@ private static String diagsForEntry(Map.Entry<String, 
PropertyStats> member) {
         }
     }
 
+    /**
+     * Produce an {@link UpdateOp} suitable for shrinking branch revision 
entries for given property in {@link Document}, {@code null} otherwise.
+     * 
+     * @param doc document to inspect for repeated branch commits
+     * @param propertName property to check for
+     * @param revisionChecker filter for revisions (for instance, to check for 
cluster id)
+     * @return {@link UpdateOp} suitable for shrinking document, {@code null} 
otherwise
+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String 
propertyName, Predicate<Revision> revisionChecker) {

Review Comment:
   If `doc` could be a `NodeDocument` instead of the generic `Document` some of 
those `instanceof` could be avoided...



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +278,73 @@ private static String diagsForEntry(Map.Entry<String, 
PropertyStats> member) {
         }
     }
 
+    /**
+     * @return cluster if from first revision found in op, {@code -1} otherwise
+     */
+    public static int extractClusterId(UpdateOp op) {
+        for (Key key : op.getChanges().keySet()) {
+            if (key.getRevision() != null) {
+                return key.getRevision().getClusterId();
+            }
+        }
+        return -1;
+    }
+
+    /**
+     * Produce an {@link UpdateOp} suitable for shrinking branch revision 
entries for given property in {@link Document}, {@code null} otherwise.
+     * 
+     * @param doc document to inspect for repeated branch commits
+     * @param propertName property to check for
+     * @param revisionChecker filter for revisions (for instance, to check for 
cluster id)
+     * @return {@link UpdateOp} suitable for shrinking document, {@code null} 
otherwise
+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String 
propertyName, Predicate<Revision> revisionChecker) {
+        Object t_bc = doc.get("_bc");
+        Object t_property = doc.get(propertyName);
+        if (t_bc instanceof Map && t_property instanceof Map) {
+            @SuppressWarnings("unchecked")
+            Map<Revision, String> _bc = (Map<Revision, String>)t_bc;
+            @SuppressWarnings("unchecked")
+            Map<Revision, String> pMap = (Map<Revision, String>)t_property;
+            List<Revision> revs = new ArrayList<>();
+            for (Map.Entry<Revision, String> en : pMap.entrySet()) {
+                Revision r = en.getKey();
+                if (revisionChecker.apply(r)) {
+                    String bcv = _bc.get(r);
+                    if ("true".equals(bcv)) {
+                        revs.add(r);
+                    }
+                }
+            }
+            // sort by age
+            Collections.sort(revs, new Comparator<Revision>() {
+                @Override
+                public int compare(Revision r1, Revision r2) {
+                    if (r1.getClusterId() != r2.getClusterId()) {
+                        return r1.getClusterId() - r2.getClusterId();
+                    } else if (r1.getTimestamp() != r2.getTimestamp()) {
+                        return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;
+                    } else {
+                        return r1.getCounter() - r2.getCounter();
+                    }
+                }});
+
+            UpdateOp clean = new UpdateOp(doc.getId(), false);
+            Revision last = null;
+            for (Revision r : revs) {
+                if (last != null) {
+                    if (last.getClusterId() == r.getClusterId()) {
+                        clean.removeMapEntry(propertyName, last);
+                    }
+                }
+                last = r;
+            }

Review Comment:
   * this seems to be a bit a more broader GC than expected. It deletes 
basically all older revisions of the same clusterId. That seems fine - but at 
that point I'm wondering why restrict it to only branch commits - i.e why 
specifically only branch commits. Which would lead to the assumption that the 
idea was to only remove "overwritten branch commits"?
   * is the clusterId check here still necessary given the predicate check 
introduced?
   * this doesn't take the usual 24h GC max time nor active checkpoints into 
account.



##########
oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java:
##########
@@ -274,6 +277,61 @@ private static String diagsForEntry(Map.Entry<String, 
PropertyStats> member) {
         }
     }
 
+    /**
+     * Produce an {@link UpdateOp} suitable for shrinking branch revision 
entries for given property in {@link Document}, {@code null} otherwise.
+     * 
+     * @param doc document to inspect for repeated branch commits
+     * @param propertName property to check for
+     * @param revisionChecker filter for revisions (for instance, to check for 
cluster id)
+     * @return {@link UpdateOp} suitable for shrinking document, {@code null} 
otherwise
+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String 
propertyName, Predicate<Revision> revisionChecker) {
+        Object t_bc = doc.get("_bc");
+        Object t_property = doc.get(propertyName);
+        if (t_bc instanceof Map && t_property instanceof Map) {
+            @SuppressWarnings("unchecked")
+            Map<Revision, String> _bc = (Map<Revision, String>)t_bc;
+            @SuppressWarnings("unchecked")
+            Map<Revision, String> pMap = (Map<Revision, String>)t_property;
+            List<Revision> revs = new ArrayList<>();
+            for (Map.Entry<Revision, String> en : pMap.entrySet()) {
+                Revision r = en.getKey();
+                if (revisionChecker.apply(r)) {
+                    String bcv = _bc.get(r);
+                    if ("true".equals(bcv)) {
+                        revs.add(r);
+                    }
+                }
+            }
+            // sort by age
+            Collections.sort(revs, new Comparator<Revision>() {

Review Comment:
   wondering if there isn't such a `Comparator` in oak land already?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to