Thomas Mueller created OAK-858:
----------------------------------

             Summary: NodeBuilder.getChildNodeCount performance and scalability
                 Key: OAK-858
                 URL: https://issues.apache.org/jira/browse/OAK-858
             Project: Jackrabbit Oak
          Issue Type: Improvement
            Reporter: Thomas Mueller


The method NodeBuilder.getChildNodeCount() is currently supposed to return the 
exact number of child nodes of a node. If there are no or only few child nodes 
(which is the most common case), this isn't a problem. 

However, if there are many nodes (thousands, maybe millions), then keeping an 
accurate and up-to-date count is tricky. It is specially tricky in a cluster, 
if you want to allow concurrent add node / delete node. This is for example 
needed for the UUID index currently. There would be a way to avoid concurrent 
add/remove: by using some hierarchy, that is, _avoid_ using many child nodes. 
But efficient, scalable support for many child nodes is one of the goals of Oak 
in my view.

I think it's not worth the effort to support efficient, accurate child node 
*counts* if there are many child nodes. Instead, I suggest to change the 
contract, and possibly even change the method.

The current usages of the method NodeBuilder.getChildNodeCount(), excluding 
usage within getChildNodeCount itself, toString(), and tests: 

* AbstractNodeState.equals, where it's used to avoid iterating over all child 
nodes if possible. But it doesn't always avoid iterating over all child nodes, 
so this method anyway is problematic. I even suggest to remove it (or throw an 
exception) because of the potential performance problem if there are many child 
nodes.

* Template constructor, where there are only 3 cases: 0, 1, and many child 
nodes.

* EmptyNodeState.equals, where there are only 2 cases: 0 and non-0.

* SecureNodeState.WrapChildEntryFunction.apply, where there are only 2 cases: 0 
and non-0.

Because of that, in theory we could simply change the contract of the method to 
return only "0, 1, Long.MAX_VALUE". However this seems dangerous. 

Instead, I see two options:

* change the method to return an enum: NO, ONE, MANY.

* change the method to NodeBuilder.getChildNodeCount(long max), where max is 
the maximum returned value. So that a typical method call would be 
getChildNodeCount(1) if you only care about 0 or non-0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to