[ 
https://issues.apache.org/jira/browse/OAK-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674101#comment-13674101
 ] 

Thomas Mueller commented on OAK-853:
------------------------------------

> HAMT data structure used by the SegmentMK allows for efficient diffing in 
> this case

Actually, I doubt it's possible to optimize this case. Please remember the 
method call is:

{code}
SegmentNodeState.compareAgainstBaseState(ModifiedNodeState, Diff)
{code}

As far as I see, the SegmentNodeState _has_ to read all child nodes of the 
ModifiedNodeState. That's where the problem is, because this is potentially 
millions of entries. There is no HAMT structure at that level, unless you 
hardcode internals of the ModifiedNodeState at that level.

                
> Many child nodes: Diffing causes many calls to MicroKernel.getNodes
> -------------------------------------------------------------------
>
>                 Key: OAK-853
>                 URL: https://issues.apache.org/jira/browse/OAK-853
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Thomas Mueller
>         Attachments: OAK-853.patch
>
>
> Creating a flat hierarchy of the following form causes many calls to 
> MicroKernel.getNodes and is thus slow.
> {code}
> for (int i = 0; i < 10000; i++) {
>     root.addNode("test" + i, "nt:folder");
>     if (i % 1000 == 0) {
>         session.save();
>     }
> }
> {code}
> As far as I see, this isn't just the case for MicroKernel based storage, but 
> also for the SegmentNodeStore. The reason seems to be that the optimization 
> for many child nodes in KernelNodeState.compareAgainstBaseState and 
> SegmentNodeState.compareAgainstBaseState that avoids iterating over all 
> children doesn't work. 
> The optimization uses:
> {code}
> if (base instanceof SegmentNodeState) ...
> if (base instanceof KernelNodeState) ...
> {code}
> Ideally, the instanceof should be avoided, but I'm not sure how to do that 
> yet. Anyway, the problem is that "base" is a ModifiedNodeState so no 
> optimization can be used.
> I was thinking, couldn't the ModifiedNodeState do a reverse diff in this 
> case? That is, inside ModifiedNodeState.compareAgainstBaseState, check if the 
> "base" parameter is a ModifiedNodeState, and the "base" field is not, then do 
> a reverse diff, which would be efficient. (We should probably not use "base" 
> for both the field name and the parameter; well that's a change for another 
> time.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to