[ https://issues.apache.org/jira/browse/SLING-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253669#comment-17253669 ]
Miroslav Smiljanic commented on SLING-10011: -------------------------------------------- Hi [~cziegeler] In Oak session implementation, [Session#getItem|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.34.0/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/session/SessionImpl.java#L363] is internally calling getItemOrNull, so performance is the same between those two. Let's suppose that we have already loaded resource "/a/b/c/d/e/f/g/h". When we request parent resource with [JcrResourceProvider.getParent|https://github.com/apache/sling-org-apache-sling-jcr-resource/blob/org.apache.sling.jcr.resource-3.0.22/src/main/java/org/apache/sling/jcr/resource/internal/helper/jcr/JcrResourceProvider.java#L361], session.getItemOrNull("/a/b/c/d/e/f/g"), will be used eventually. {noformat} JcrResourceProvider.getParent(ctx, childResource) --> JcrItemResourceFactory#getItemOrNull(parentPath) --> session.getItemOrNull(parentPath) {noformat} When session.getItemOrNull("/a/b/c/d/e/f/g") is being executed in order to obtain the parent node, the whole path will be traversed *again* (since it was already done when loading the child node initially), and every intermediary node will be loaded from the Oak node store. In case of segment node store, [SegmentNodeState.getChildNode|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.34.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentNodeState.java#L448] will be called. That method will obtain segment where node is persisted, and retrieve node record from it. When segment is being retrieved, check is done whether segment is in internal in-memory cache. If segment is not in cache, it has to be loaded from node store persistent storage. Currently we have three segment node store persistence implementations (tar, Azure, AWS). When Sling application is under considerable load, in-memory segment cache will be evicted quite often, and segments will have to be loaded from persistent storage. In case when we use remote cloud storage, performance will be further impacted with many network roundtrips. In method chain above, if instead of session.getItem or session.getItemOrNull, javax.jcr.Item#getParent() would be used, we would not see the same path traversal when requesting parent node (SegmentNodeState.getChildNode is not called again in case of segment node store implementation). This is because when obtained child node has been loaded, parent node has been loaded as well, and child node has its reference. More precisely, child node [tree|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.34.0/oak-api/src/main/java/org/apache/jackrabbit/oak/api/Tree.java] has reference to its parent tree, and parent tree has reference to the loaded parent node state. This is not specific to the segment node store implementation, but part of the core Oak logic. Because of those performance considerations, when implementing JcrResourceProvider.getParent, I am suggesting to use javax.jcr.Item#getParent() instead of session.getItem or session.getItemOrNull. > Use javax.jcr.Item.getParent() when resolving parent JCR node in > JcrResourceProvider#getParent > ---------------------------------------------------------------------------------------------- > > Key: SLING-10011 > URL: https://issues.apache.org/jira/browse/SLING-10011 > Project: Sling > Issue Type: Improvement > Components: JCR > Affects Versions: JCR Resource 3.0.22 > Reporter: Miroslav Smiljanic > Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently > [JcrResourceProvider.getParent|https://github.com/apache/sling-org-apache-sling-jcr-resource/blob/org.apache.sling.jcr.resource-3.0.22/src/main/java/org/apache/sling/jcr/resource/internal/helper/jcr/JcrResourceProvider.java#L361] > is using JcrItemResourceFactory.getItemOrNull(String path), which eventually > is using JCR session to retrieve parent node using absolute path. > I propose using javax.jcr.Item.getParent() instead. > Reasoning wold be to utilise potential improvements in JCR implementation > that would for a given node retrieve the whole subtree. That can be > configured for example by using particular node type or node path. > {noformat} > root > | > a > / \ > b c > {noformat} > If node 'a' in picture above, is matching desired configuration, then code > below would return the whole subtree. > {code:java} > Node a = jcrSession.getNode("a"); > {code} > That further means retrieved subtree can be traversed in memory, without the > need to communicate with the JCR repository storage. > (!)That is particularly important when remote (cloud) storage is used for > repository in JCR implementation, and tree traversal can be done without > doing additional network roundtrips. > {code:java} > //JCR tree traversal happens in memory > Node b = a.getNode("b"); > Node c = a.getNode("c"); > {code} > Also going from child to parent, is resolved in memory as well (proposal > relates to this fact) > {code:java} > //JCR tree traversal happens in memory > assert b.getParent() == c.getParent(); > {code} > Jackrabbit Oak, for document node store is supporting node bundling for > configured node type > [http://jackrabbit.apache.org/oak/docs/nodestore/document/node-bundling.html] > Currently I am also doing some experiments to support node > bundling/aggregation for arbitrary node store > ([NodeDelegateFullyLoaded|https://github.com/smiroslav/jackrabbit-oak/blob/ppnextgen_newstore/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/delegate/NodeDelegateFullyLoaded.java], > > [FullyLoadedTree|https://github.com/smiroslav/jackrabbit-oak/blob/ppnextgen_newstore/oak-core/src/main/java/org/apache/jackrabbit/oak/core/FullyLoadedTree.java]). -- This message was sent by Atlassian Jira (v8.3.4#803005)