[ https://issues.apache.org/jira/browse/OAK-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818854#comment-13818854 ]
Chetan Mehrotra edited comment on OAK-1156 at 11/12/13 10:57 AM: ----------------------------------------------------------------- There are multiple approaches possible *Approach A* # Find the {{_modCount}} for all the cached documents using Mongo In query # Then invalidate those entries where the Mod count differs Approach A is more like brute force and might take quite a bit of time if the number of cached entries are quite high *Approach B* # Create a tree structure out of the paths of the cached documents # Start traversing the tree in breadth first mode and fetch the {{_lastRev}} data for the nodes at same level # If the last revision is same as the one in the cached document then in some cases it can be considered that all nodes under that path have not been modified. So we mark such cached documents as up-to-date and filter them out from the traversal. Any child node would be considered uptodate only if either ## The in memory creation time of the child node is greater than the creation time of root node which is found to be uptodate. So if /a/b is found to be valid (lastRev not changed) then a child node like /a/b/c/d would be also be considered uptodate if its creation time is greater than /a/b creation time as then it would have been added later in the cache. ## OR the last check time for both /a/b and /a/b/c/d are same. This means that previous run would have checked that both nodes are consistent and /a/b lastRev can be taken as authorative source for state of this child node In this approach we can save on lots of queries as in most cases the major portion of tree might not have got changed. However we need to be carefull to not to leave any stale entry in the cache. For example when ever we add a new document to cache say at path {{/foo/bar}} it would have latest {{_lastRev}} entry. However the already cached doc under that path would not be check in that flow. So in above flow we might falsefully consider that tree under {{/foo/bar}} is consistent and thus hold a stale copy was (Author: chetanm): There are multiple approaches possible *Approach A* # Find the {{_modCount}} for all the cached documents using Mongo In query # Then invalidate those entries where the Mod count differs Approach A is more like brute force and might take quite a bit of time if the number of cached entries are quite high *Approach B* # Create a tree structure out of the paths of the cached documents # Start traversing the tree in breadth first mode and fetch the {{_lastRev}} data for the nodes at same level # If the last revision is same as the one in the cached document then in some cases it can be considered that all nodes under that path have not been modified. So we mark such cached documents as up-to-date and filter them out from the traversal In this approach we can save on lots of queries as in most cases the major portion of tree might not have got changed. However we need to be carefull to not to leave any stale entry in the cache. For example when ever we add a new document to cache say at path {{/foo/bar}} it would have latest {{_lastRev}} entry. However the already cached doc under that path would not be check in that flow. So in above flow we might falsefully consider that tree under {{/foo/bar}} is consistent and thus hold a stale copy > Improve the document cache invalidation logic to selectivly invalidate doc > -------------------------------------------------------------------------- > > Key: OAK-1156 > URL: https://issues.apache.org/jira/browse/OAK-1156 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: mongomk > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Attachments: OAK-1156.patch > > > Currently the Background Read operation invalidates the complete cache in > {{MongoNodeStore}} upon detecting external change. Instead of that it should > check for which cached documents are stale and only invalidate them. > It can make use of {{_lastRev}} to check if nodes within a subtree have > changed or not. -- This message was sent by Atlassian JIRA (v6.1#6144)