[ 
https://issues.apache.org/jira/browse/OAK-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676887#comment-13676887
 ] 

Jukka Zitting commented on OAK-858:
-----------------------------------

Another +1 for the second option.

Currently we don't use {{getChildNodeCount()}} for implementing 
{{Node.getNodes().getSize()}}, but we'll probably want to do so once we come up 
with a way to streamline the access control and visibility checks for the 
common case when everything within a subtree is readable. The second option 
keeps the door open for such an optimization.

The specific definition of {{getChildNodeCount(long max)}} should probably 
specify that the cost of the operation will be at most {{O(max)}}. Also, it 
would be good if the method could also return a higher value than {{max}} in 
case it is available in {{O(1)}} time. We could use a sentinel value like 
{{-1}} or {{Long.MAX_VALUE}} to signal the case where there the child count is 
some unknown value that's higher than {{max}}.
                
> NodeBuilder.getChildNodeCount performance and scalability
> ---------------------------------------------------------
>
>                 Key: OAK-858
>                 URL: https://issues.apache.org/jira/browse/OAK-858
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Thomas Mueller
>
> The method NodeBuilder.getChildNodeCount() is currently supposed to return 
> the exact number of child nodes of a node. If there are no or only few child 
> nodes (which is the most common case), this isn't a problem. 
> However, if there are many nodes (thousands, maybe millions), then keeping an 
> accurate and up-to-date count is tricky. It is specially tricky in a cluster, 
> if you want to allow concurrent add node / delete node. This is for example 
> needed for the UUID index currently. There would be a way to avoid concurrent 
> add/remove: by using some hierarchy, that is, _avoid_ using many child nodes. 
> But efficient, scalable support for many child nodes is one of the goals of 
> Oak in my view.
> I think it's not worth the effort to support efficient, accurate child node 
> *counts* if there are many child nodes. Instead, I suggest to change the 
> contract, and possibly even change the method.
> The current usages of the method NodeBuilder.getChildNodeCount(), excluding 
> usage within getChildNodeCount itself, toString(), and tests: 
> * AbstractNodeState.equals, where it's used to avoid iterating over all child 
> nodes if possible. But it doesn't always avoid iterating over all child 
> nodes, so this method anyway is problematic. I even suggest to remove it (or 
> throw an exception) because of the potential performance problem if there are 
> many child nodes.
> * Template constructor, where there are only 3 cases: 0, 1, and many child 
> nodes.
> * EmptyNodeState.equals, where there are only 2 cases: 0 and non-0.
> * SecureNodeState.WrapChildEntryFunction.apply, where there are only 2 cases: 
> 0 and non-0.
> Because of that, in theory we could simply change the contract of the method 
> to return only "0, 1, Long.MAX_VALUE". However this seems dangerous. 
> Instead, I see two options:
> * change the method to return an enum: NO, ONE, MANY.
> * change the method to NodeBuilder.getChildNodeCount(long max), where max is 
> the maximum returned value. So that a typical method call would be 
> getChildNodeCount(1) if you only care about 0 or non-0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to