[ 
https://issues.apache.org/jira/browse/JCR-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238375#comment-13238375
 ] 

Thomas Mueller commented on JCR-3263:
-------------------------------------

> until we have a concrete use-case for such a method

Well, this isn't really about not having a concrete use case... You already 
seem to have a use case. It's just that you proposed to add a new method:

    Map<NodeId, NodeInfo> getAllNodeInfos(NodeId after, int maxCount)

and I have suggested to use a different method instead:

    Map<NodeId, NodeInfo> getNodeInfos(List<NodeId> ids)

But other than that, I can't really comment on the patch as I'm not familiar 
with the consistency checker implementation. With 69 KB, the patch you provided 
is quite large.
                
> Consistency checker performance improvements
> --------------------------------------------
>
>                 Key: JCR-3263
>                 URL: https://issues.apache.org/jira/browse/JCR-3263
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>            Reporter: Unico Hommes
>         Attachments: checkerperformance.patch
>
>
> Currently the consistency checker loads in a batch of node ids and for each 
> node id fetches the corresponding bundle, its child bundles, and parent 
> bundle separately. This makes the consistency checker perform less than 
> optimal and may take hours (days?) to complete for large repositories.
> I've been able to make the checker execute about 20 times faster on my local 
> machine by loading in batches of node prop bundles at once. For 17000 nodes 
> in the workspace the current implementation ran for about 23 seconds whereas 
> with the enhancements I made it finished in 1.2 seconds.
> Now the problem lies in the fact that loading in node prop bundles in batches 
> may require a lot of memory. And it is not very predictable how much per 
> batch size because the sizes of the individual bundles are unpredictable.
> Also the node prop bundle contains much more information than is needed for a 
> consistency check.
> What would be ideal in this situation is to introduce a new type - call it 
> NodeInfo - that contains only the structural information the checker needs to 
> do its work. Meaning the node id, the parent id and the child ids. In order 
> to allow for a possible future referential integrity check perhaps also its 
> reference type propeties.
> The IterablePersistenceManager interface would then get an additional method:
> Map<NodeId, NodeInfo> getAllNodeInfos();
> If this is an acceptable proposal I would like to work on this and contribute 
> a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to