[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-03-06 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823943#comment-17823943
 ] 

Manfred Baedke commented on OAK-10660:
--

trunk (1.62.0): 
[d96c9d01|https://github.com/apache/jackrabbit-oak/commit/d96c9d0174120ef9998ff17e93e313c6430668f7]

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821312#comment-17821312
 ] 

Manfred Baedke commented on OAK-10660:
--

Changed the default behavior. Now running integration tests.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821291#comment-17821291
 ] 

Manfred Baedke commented on OAK-10660:
--

Yes, you are right, but I'd feel better if I understood what's going on there.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821290#comment-17821290
 ] 

Julian Reschke commented on OAK-10660:
--

Isn't this something we just need for the test case (as long as the default is 
'off')? Or am I missing something here?

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821272#comment-17821272
 ] 

Manfred Baedke commented on OAK-10660:
--

Ok, I'll change the Default.
FYI: Current state of the Branch OAK-10660 has mysterious test failures in 
BinaryAccessTest in oak-jcr. They disappear if we do not attach a Whiteboard to 
to underlying Jcr instance (the Whiteboard is introduced with the PR for 
feature toggle support in test cases). If anyone has an idea, please share.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821269#comment-17821269
 ] 

Stefan Egli commented on OAK-10660:
---

{quote}Checking: the default for the feaure toggle will be "on", right?{quote}
+1 for on as default.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821218#comment-17821218
 ] 

Julian Reschke commented on OAK-10660:
--

Checking: the default for the feaure toggle will be "on", right?

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821137#comment-17821137
 ] 

Manfred Baedke commented on OAK-10660:
--

??Please wait with the merge until I'm done with OAK-10673.??

Sure. I just merged PR#1326 (OAK-10660-alt) into OAK-10660. Now testing that.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821130#comment-17821130
 ] 

Julian Reschke commented on OAK-10660:
--

Please wait with the merge until I'm done with OAK-10673.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821110#comment-17821110
 ] 

Manfred Baedke commented on OAK-10660:
--

Looks like it's just a missing null check. I'll merge the PR as soon as all 
tests are finished.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821089#comment-17821089
 ] 

Julian Reschke commented on OAK-10660:
--

ah, I was confused. DocumentStore and DocumentNodeStore are in the same project 
:-)

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821087#comment-17821087
 ] 

Manfred Baedke commented on OAK-10660:
--

Possibly my FeatureToggle-related changes, we will see.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821086#comment-17821086
 ] 

Julian Reschke commented on OAK-10660:
--

? Without any changes over there?

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Manfred Baedke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821081#comment-17821081
 ] 

Manfred Baedke commented on OAK-10660:
--

[~reschke] yes. Actually oak-store-document has test failures, I'm 
investigating.


> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821079#comment-17821079
 ] 

Julian Reschke commented on OAK-10660:
--

[~baedke] - can you take over fianlizing the PR? I'll take care of OAK-10673.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Manfred Baedke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-27 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821059#comment-17821059
 ] 

Julian Reschke commented on OAK-10660:
--

TODO:

- retest with [~stefanegli]'s change
- integrate feature toggle
- add a DocumentStore test proving that we can remove map entries that do not 
exist

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-26 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820703#comment-17820703
 ] 

Julian Reschke commented on OAK-10660:
--

Actually we have test coverage: in our current test we have ca 50 commits with 
ever growing :childOrder properties. In each commit, we *try* to remove all, 
but in practice, all but the last will be removed already.

What we may not testing is the case where we map does not exist all all (no 
:childOder yet), but we can add a low level test for that.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-26 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820668#comment-17820668
 ] 

Stefan Egli commented on OAK-10660:
---

IIUC then the idea of this ticket is to only look at unmerged branch commits of 
the own branch. Based on that, I would indeed only consider the own branch 
commit - not any other. So I wouldn't remove just all revisions.

> "just remove the previous one"

I think that would be the optimal - but I fear in {{Commit}} you don't know 
which actual previous one it contains. Consider an unmerged branch with 3 
branch commits so far. Without actually inspecting the document, how could 
{{Commit}} know, which one of those 3 ones were actually adding a ":childOrder" 
revisions. It could be the first, the second, any combination of the 3 really. 
So, if it is not feasible to have the Document available in 
{{Commit.applyToDocumentStore}}, then the only option would actually be to add 
{{op.removeMapEntry(":childOrder", rev)}} with all of those 3 previous revs. 
This might have to be tested, I'm not sure if the command would fail if you 
instruct it to remove a map entry that doesn't exist? (Also wondering, if we'd 
add removeMapEntry for all previous branch commits - maybe we want to still 
restrict that number if that grows ridiculously large. Eg if you have more than 
1000 branch commits, wouldn't it be reasonably sufficient to only remove the 
last 1000 (risking that you might still miss some)?

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-26 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820637#comment-17820637
 ] 

Julian Reschke commented on OAK-10660:
--

Good point.

Any opinion about whether we need to avoid the "REMOVE *all* previous versions 
of :childOrder" as opposed to "just remove the previous one"?

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-26 Thread Stefan Egli (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820636#comment-17820636
 ] 

Stefan Egli commented on OAK-10660:
---

In DocumentNodeStore the revisions are actually already tracked in the 
{{org.apache.jackrabbit.oak.plugins.document.Branch}} - which is reachable via 
the {{org.apache.jackrabbit.oak.plugins.document.UnmergedBranches}} - try
{noformat}
// rev being the branch commit revision vector
dns.getBranches().getBranch(rev);
{noformat}

That contains the unmerged branches of the local instance - which I guess is 
what we want.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-24 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820306#comment-17820306
 ] 

Julian Reschke commented on OAK-10660:
--

TODOs:

- add feature toggle
- potentially refactor so that the Set is passed to the Commit constructor 
(less changes in method signatures)

Idea:

- if we change "Set" to "Map", we could keep the 
last relevant revision per document, and thus wouldn't need to erase 
non-existing revisions. That would increase memory footprint, but reduce the 
size of requests sent to the DocumentStore.


> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820143#comment-17820143
 ] 

Julian Reschke commented on OAK-10660:
--

OK, this might be easier than expected. PR soonish.


> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OAK-10660) DocumentNodeStore: avoid repeated commits of :childOrder in branch commits

2024-02-23 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820045#comment-17820045
 ] 

Julian Reschke commented on OAK-10660:
--

Here:

 
{noformat}
diff --git 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
index 1b04f62fa5..5cdf3901da 100644
--- 
a/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
+++ 
b/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentNodeStoreBranch.java
@@ -78,6 +78,9 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
     /** The maximum number of updates to keep in memory */
     private final int updateLimit;+    /** Revisions written by us */
+    private final Set revisions = new HashSet<>();
+
     /**
      * State of the this branch. Either {@link Unmodified}, {@link InMemory}, 
{@link Persisted},
      * {@link ResetFailed} or {@link Merged}.
@@ -321,6 +324,7 @@ class DocumentNodeStoreBranch implements NodeStoreBranch {
             c.apply();
             rev = store.done(c, base.getRootRevision().isBranch(), info);
             success = true;
+            revisions.add(c.getRevision());
         } finally {
             if (!success) {
                 store.canceled(c);
 {noformat}

would be a good place to track what revisions are relevant. Now we need to 
figure out how to pass this down to the place where we create the {UpdateOp}s.

> DocumentNodeStore: avoid repeated commits of :childOrder in branch commits
> --
>
> Key: OAK-10660
> URL: https://issues.apache.org/jira/browse/OAK-10660
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: documentmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Major
>
> - While persisting the branch commits, we are persisting large :childOrder 
> properties repeatedly. In practice, only the last value is needed, so the 
> previous ones could be cleaned up.
>  - We currently do not keep information about when (revision) and where (_id) 
> we have set :childOrder.
>  - The "clean" approach would be to maintain a map of _id/revision that tells 
> us in which revision we last set :childOrder. That could be used to pair the 
> setting of the new value with a removal of the previous one.
>  - But we may be able to simplify that: just maintain a list of _all_ 
> revisions that changed :childOrder, and any time we need to set a new value 
> for :childOrder, nuke the entries for all of these revisions. This would be 
> harmless because an extra REMOVE_MAP_ENTRY operation is essentially free, 
> except fo ra small overhead in processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)