[ https://issues.apache.org/jira/browse/JCR-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238375#comment-13238375 ]
Thomas Mueller commented on JCR-3263: ------------------------------------- > until we have a concrete use-case for such a method Well, this isn't really about not having a concrete use case... You already seem to have a use case. It's just that you proposed to add a new method: Map<NodeId, NodeInfo> getAllNodeInfos(NodeId after, int maxCount) and I have suggested to use a different method instead: Map<NodeId, NodeInfo> getNodeInfos(List<NodeId> ids) But other than that, I can't really comment on the patch as I'm not familiar with the consistency checker implementation. With 69 KB, the patch you provided is quite large. > Consistency checker performance improvements > -------------------------------------------- > > Key: JCR-3263 > URL: https://issues.apache.org/jira/browse/JCR-3263 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Reporter: Unico Hommes > Attachments: checkerperformance.patch > > > Currently the consistency checker loads in a batch of node ids and for each > node id fetches the corresponding bundle, its child bundles, and parent > bundle separately. This makes the consistency checker perform less than > optimal and may take hours (days?) to complete for large repositories. > I've been able to make the checker execute about 20 times faster on my local > machine by loading in batches of node prop bundles at once. For 17000 nodes > in the workspace the current implementation ran for about 23 seconds whereas > with the enhancements I made it finished in 1.2 seconds. > Now the problem lies in the fact that loading in node prop bundles in batches > may require a lot of memory. And it is not very predictable how much per > batch size because the sizes of the individual bundles are unpredictable. > Also the node prop bundle contains much more information than is needed for a > consistency check. > What would be ideal in this situation is to introduce a new type - call it > NodeInfo - that contains only the structural information the checker needs to > do its work. Meaning the node id, the parent id and the child ids. In order > to allow for a possible future referential integrity check perhaps also its > reference type propeties. > The IterablePersistenceManager interface would then get an additional method: > Map<NodeId, NodeInfo> getAllNodeInfos(); > If this is an acceptable proposal I would like to work on this and contribute > a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira