[jira] [Comment Edited] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821059#comment-17821059
 ] 

Julian Reschke edited comment on OAK-10660 at 2/27/24 9:05 AM:
---

TODO:

- retest with [~stefanegli]'s change
- integrate feature toggle
- add a DocumentStore test proving that we can remove map entries that do not 
exist (OAK-10673)


was (Author: reschke):
TODO:

- retest with [~stefanegli]'s change
- integrate feature toggle
- add a DocumentStore test proving that we can remove map entries that do not 
exist

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-26 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820636#comment-17820636
 ] 

Stefan Egli edited comment on OAK-10660 at 2/26/24 9:17 AM:


In DocumentNodeStore the revisions are actually already tracked in the 
{{org.apache.jackrabbit.oak.plugins.document.Branch}} - which is reachable via 
the {{org.apache.jackrabbit.oak.plugins.document.UnmergedBranches}} - try
{noformat}
// rev being the branch commit revision vector
dns.getBranches().getBranch(rev);
{noformat}

That contains the unmerged branches of the local instance - which I guess is 
what we want (IOW I don't think we need a new revisions set)


was (Author: egli):
In DocumentNodeStore the revisions are actually already tracked in the 
{{org.apache.jackrabbit.oak.plugins.document.Branch}} - which is reachable via 
the {{org.apache.jackrabbit.oak.plugins.document.UnmergedBranches}} - try
{noformat}
// rev being the branch commit revision vector
dns.getBranches().getBranch(rev);
{noformat}

That contains the unmerged branches of the local instance - which I guess is 
what we want.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820045#comment-17820045
 ] 

Julian Reschke edited comment on OAK-10660 at 2/23/24 12:42 PM:


Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the UpdateOps.


was (Author: reschke):
Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the {UpdateOp}s.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)