[ 
https://issues.apache.org/jira/browse/OAK-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589631#comment-14589631
 ] 

Stefan Egli commented on OAK-3002:
----------------------------------

[~chetanm], good point indeed. However, the way the journal is structured 
currently doing this check atm still results in fewer invalidated entries. 
Here's why:
* the journal contains a tree of nodes that have changed. However, it does not 
yet indicate if a non-leaf node has changed or not. For a leaf-node it is 
clear, that has changed. But for a non-leaf that is atm not yet marked in any 
way (maybe we could do that).
* the result of this is, that we treat non-leaf nodes as changed as well - 
which we overlay with checks as to whether something really has changed (be 
that somewhere in {{diff()}} or in {{invalidateCache(Iterable)}}.
* now when it comes to invalidating it could be that the cache already has a 
non-leaf node in an up-to-date version, so doing a {{modCnt}} check helps 
finding that out and subsequently avoiding invalidating if the document is 
indeed the most current one.

I have done some testing and found that it can invalidate between very roughly 
1 and 50% fewer documents doing this check (I can attach the test case once 
polished properly, it's a bit hacky atm).

Now of course the time of doing the modCnt-check needs to be compared with the 
gain. Perhaps my test case was too much geared towards doing many changes thus 
yields better and in real-life it would be 'good enough' to invalidate all ids 
(leafs and non-leafs) of the journal. So it's difficult to say if the gain is 
really big.

Also, we could perhaps do better and mark non-leafs explicitly somehow if they 
have changed as well (other than the fact that a child has changed)

> Optimize docCache by filtering using journal
> --------------------------------------------
>
>                 Key: OAK-3002
>                 URL: https://issues.apache.org/jira/browse/OAK-3002
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: core, mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>              Labels: scalability
>             Fix For: 1.3.1, 1.2.3
>
>         Attachments: OAK-3002-improved-doc-cache-invaliation.2.patch
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14588114&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14588114]
>  on OAK-2829 re optimizing docCache invalidation using the newly introduced 
> external diff journal:
> {quote}
> Attached OAK-2829-improved-doc-cache-invaliation.patch which is a suggestion 
> on how to avoid invalidating the entire document cache when doing a 
> {{backgroundRead}} but instead making use of the new journal: ie only 
> invalidate from the document cache what has actually changed.
> I'd like to get an opinion ([~mreutegg], [~chetanm]?) on this first, I have a 
> load test pending locally which found invalidation of the document cache to 
> be the slowest part thus wanted to optimize this first.
> Open still/next:
>  * also invalidate only necessary parts from the docChildrenCache
>  * junits for all of these
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to