[ 
https://issues.apache.org/jira/browse/OAK-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818854#comment-13818854
 ] 

Chetan Mehrotra edited comment on OAK-1156 at 11/12/13 10:57 AM:
-----------------------------------------------------------------

There are multiple approaches possible

*Approach A*
# Find the {{_modCount}} for all the cached documents using Mongo In query 
# Then invalidate those entries where the Mod count differs

Approach A  is more like brute force and might take quite a bit of time if the 
number of cached entries are quite high

*Approach B*
# Create a tree structure out of the paths of the cached documents
# Start traversing the tree in breadth first mode and fetch the {{_lastRev}} 
data for the nodes at same level
# If the last revision is same as the one in the cached document then in some 
cases it can be considered that all nodes under that path have not been 
modified. So we mark such cached documents as up-to-date and filter them out 
from the traversal. Any child node would be considered uptodate only if either
## The in memory creation time of the child node is greater than the creation 
time of root node which is found to be uptodate. So if /a/b is found to be 
valid (lastRev not changed) then a child node like /a/b/c/d would be also be 
considered uptodate if its creation time is greater than /a/b creation time as 
then it would have been added later in the cache.
## OR the last check time for both /a/b and /a/b/c/d are same. This means that 
previous run would have checked that both nodes are consistent and /a/b lastRev 
can be taken as authorative source for state of this child node 

In this approach we can save on lots of queries as in most cases the major 
portion of tree might not have got changed. However we need to be carefull to 
not to leave any stale entry in the cache. For example when ever we add a new 
document to cache say at path {{/foo/bar}}  it would have latest {{_lastRev}} 
entry. However the already cached doc  under that path would not be check in 
that flow. So in above flow we might falsefully consider that tree under 
{{/foo/bar}} is consistent and thus hold a stale copy


was (Author: chetanm):
There are multiple approaches possible

*Approach A*
# Find the {{_modCount}} for all the cached documents using Mongo In query 
# Then invalidate those entries where the Mod count differs

Approach A  is more like brute force and might take quite a bit of time if the 
number of cached entries are quite high

*Approach B*
# Create a tree structure out of the paths of the cached documents
# Start traversing the tree in breadth first mode and fetch the {{_lastRev}} 
data for the nodes at same level
# If the last revision is same as the one in the cached document then in some 
cases it can be considered that all nodes under that path have not been 
modified. So we mark such cached documents as up-to-date and filter them out 
from the traversal

In this approach we can save on lots of queries as in most cases the major 
portion of tree might not have got changed. However we need to be carefull to 
not to leave any stale entry in the cache. For example when ever we add a new 
document to cache say at path {{/foo/bar}}  it would have latest {{_lastRev}} 
entry. However the already cached doc  under that path would not be check in 
that flow. So in above flow we might falsefully consider that tree under 
{{/foo/bar}} is consistent and thus hold a stale copy

> Improve the document cache invalidation logic to selectivly invalidate doc
> --------------------------------------------------------------------------
>
>                 Key: OAK-1156
>                 URL: https://issues.apache.org/jira/browse/OAK-1156
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>         Attachments: OAK-1156.patch
>
>
> Currently the Background Read operation invalidates the complete cache in 
> {{MongoNodeStore}} upon detecting external change. Instead of that it should 
> check for which cached documents are stale and only invalidate them. 
> It can make use of {{_lastRev}} to check if nodes within a subtree have 
> changed or not.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to