[ https://issues.apache.org/jira/browse/OAK-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589631#comment-14589631 ]
Stefan Egli commented on OAK-3002: ---------------------------------- [~chetanm], good point indeed. However, the way the journal is structured currently doing this check atm still results in fewer invalidated entries. Here's why: * the journal contains a tree of nodes that have changed. However, it does not yet indicate if a non-leaf node has changed or not. For a leaf-node it is clear, that has changed. But for a non-leaf that is atm not yet marked in any way (maybe we could do that). * the result of this is, that we treat non-leaf nodes as changed as well - which we overlay with checks as to whether something really has changed (be that somewhere in {{diff()}} or in {{invalidateCache(Iterable)}}. * now when it comes to invalidating it could be that the cache already has a non-leaf node in an up-to-date version, so doing a {{modCnt}} check helps finding that out and subsequently avoiding invalidating if the document is indeed the most current one. I have done some testing and found that it can invalidate between very roughly 1 and 50% fewer documents doing this check (I can attach the test case once polished properly, it's a bit hacky atm). Now of course the time of doing the modCnt-check needs to be compared with the gain. Perhaps my test case was too much geared towards doing many changes thus yields better and in real-life it would be 'good enough' to invalidate all ids (leafs and non-leafs) of the journal. So it's difficult to say if the gain is really big. Also, we could perhaps do better and mark non-leafs explicitly somehow if they have changed as well (other than the fact that a child has changed) > Optimize docCache by filtering using journal > -------------------------------------------- > > Key: OAK-3002 > URL: https://issues.apache.org/jira/browse/OAK-3002 > Project: Jackrabbit Oak > Issue Type: Sub-task > Components: core, mongomk > Reporter: Stefan Egli > Assignee: Stefan Egli > Labels: scalability > Fix For: 1.3.1, 1.2.3 > > Attachments: OAK-3002-improved-doc-cache-invaliation.2.patch > > > This subtask is about spawning out a > [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14588114&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14588114] > on OAK-2829 re optimizing docCache invalidation using the newly introduced > external diff journal: > {quote} > Attached OAK-2829-improved-doc-cache-invaliation.patch which is a suggestion > on how to avoid invalidating the entire document cache when doing a > {{backgroundRead}} but instead making use of the new journal: ie only > invalidate from the document cache what has actually changed. > I'd like to get an opinion ([~mreutegg], [~chetanm]?) on this first, I have a > load test pending locally which found invalidation of the document cache to > be the slowest part thus wanted to optimize this first. > Open still/next: > * also invalidate only necessary parts from the docChildrenCache > * junits for all of these > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)