[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state
[ https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139262#comment-15139262 ] Vikas Saurabh commented on OAK-3653: bq. Depending issue OAK-3634 is fixed. Resolving this issue as well. Updating fixVersion to include 1.2.11 according to OAK-3634 > Incorrect last revision of cached node state > > > Key: OAK-3653 > URL: https://issues.apache.org/jira/browse/OAK-3653 > Project: Jackrabbit Oak > Issue Type: Bug > Components: documentmk >Reporter: Vikas Saurabh >Assignee: Marcel Reutegger > Fix For: 1.3.14, 1.2.11 > > Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, diagram.png, > oplog.json > > > While installing a package in one of the systems (Oak-1.0.16), we observed > broken workflow model. Upon further investigation, it was found that > {{baseVersion}} of the model was pointing to a node in version storage which > wasn't visible on the instance. This node was available in mongo and could > also be seen on a different instance of the cluster. > The node was created on the same instance where it wasn't visible (cluster id > - 3) > Further investigation showed that for some (yet unknown) reason, > {{lastRevision}} for cached node state of node's parent at revision where the > node was created was stale (an older revision) which must have led to invalid > children list and hence unavailable node. > Attaching a few files which were captured during the investigation: > * [^oplog.json] - oplog for +-10s for r151191c7601-0-3 > * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to > list node children cache entry for parent > * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to > traverse to invisible node > * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache > instead (using script at \[2]) > The node which wasn't visible in mongo: > {noformat} > db.nodes.findOne({_id: > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"}) > { > "_id" : > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7", > "_deleted" : { > "r151191c7601-0-3" : "false" > }, > "_commitRoot" : { > "r151191c7601-0-3" : "0" > }, > "_lastRev" : { > "r0-0-3" : "r151191c7601-0-3" > }, > "jcr:primaryType" : { > "r151191c7601-0-3" : "\"nam:nt:version\"" > }, > "jcr:uuid" : { > "r151191c7601-0-3" : > "\"40234f1d-2435-4c6b-9962-20af22b1c948\"" > }, > "_modified" : NumberLong(1447825270), > "jcr:created" : { > "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\"" > }, > "_children" : true, > "jcr:predecessors" : { > "r151191c7601-0-3" : > "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]" > }, > "jcr:successors" : { > "r151191c7601-0-3" : "\"[0]:Reference\"" > }, > "_modCount" : NumberLong(1) > } > {noformat} > Last revision entry of cached node state of > {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}} > (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last > revs were propagated correctly up till root for r151191c7601-0-3). > As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the > loop of script at \[2]) revived the visibility of the node. > *Disclaimer*: _The issue was reported and investigated on a cluster backed by > mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any > which have been fixed in later versions_ > \[0]: {noformat} > def printNodeChildrenCache(def path) { > def session = > osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null) > try { > def rootNode = session.getRootNode() > def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache > def cacheMap = cache.asMap() > cacheMap.each{k,v -> > if (k.toString().startsWith(path + "@")) { > println "${k}:${v}" > } > } > } finally { > session.logout() > } > } > > printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e") > {noformat} > \[1]: {noformat} > def traverse(def path) { > def session = > osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null) > try { > def rootNode = session.getRootNode() > def nb = rootNode.sessionDelegate.root.store.root > def p = '' > path.tokenize('/').each() { > p = p + '/' + it > nb = nb.getChildNode(it) > println "${p} ${nb}" >
[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state
[ https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061988#comment-15061988 ] Marcel Reutegger commented on OAK-3653: --- I saw a similar issue while running tests for OAK-3646 and I think this is caused by OAK-3634. The attached diagram shows a possible sequence of operations in a two-node cluster with background reads and updates on a node 'foo'. !diagram.png|width=100%! First cluster node 1 updates the _lastRev of foo to r4-0-1 with a modCount of 5. Next, cluster node 2 updates its _lastRev to r7-0-2 bumping the modCount to 6 and then runs a background update. Cluster node 1 picks up the changes of the root document with a background read and starts a query to find out which documents to invalidate. The query result will contain an entry for 'foo' with a modCount of 6. At the same time a background update happens on cluster node 1 and the _lastRev is set to r9-0-1. On the MongoDB side the modCount is set to 7, but because of OAK-3634 the new cached document in Oak is calculated based on the currently cache document, which has a modCount of 5. The new document in the cache will therefore have a modCount of 6. The background read now proceeds to the invalidate phase and compares the modCount from the query with the modCount in the cache. Both are 6 and the document is not invalidated. > Incorrect last revision of cached node state > > > Key: OAK-3653 > URL: https://issues.apache.org/jira/browse/OAK-3653 > Project: Jackrabbit Oak > Issue Type: Bug > Components: documentmk >Reporter: Vikas Saurabh >Assignee: Marcel Reutegger > Fix For: 1.4 > > Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, diagram.png, > oplog.json > > > While installing a package in one of the systems (Oak-1.0.16), we observed > broken workflow model. Upon further investigation, it was found that > {{baseVersion}} of the model was pointing to a node in version storage which > wasn't visible on the instance. This node was available in mongo and could > also be seen on a different instance of the cluster. > The node was created on the same instance where it wasn't visible (cluster id > - 3) > Further investigation showed that for some (yet unknown) reason, > {{lastRevision}} for cached node state of node's parent at revision where the > node was created was stale (an older revision) which must have led to invalid > children list and hence unavailable node. > Attaching a few files which were captured during the investigation: > * [^oplog.json] - oplog for +-10s for r151191c7601-0-3 > * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to > list node children cache entry for parent > * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to > traverse to invisible node > * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache > instead (using script at \[2]) > The node which wasn't visible in mongo: > {noformat} > db.nodes.findOne({_id: > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"}) > { > "_id" : > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7", > "_deleted" : { > "r151191c7601-0-3" : "false" > }, > "_commitRoot" : { > "r151191c7601-0-3" : "0" > }, > "_lastRev" : { > "r0-0-3" : "r151191c7601-0-3" > }, > "jcr:primaryType" : { > "r151191c7601-0-3" : "\"nam:nt:version\"" > }, > "jcr:uuid" : { > "r151191c7601-0-3" : > "\"40234f1d-2435-4c6b-9962-20af22b1c948\"" > }, > "_modified" : NumberLong(1447825270), > "jcr:created" : { > "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\"" > }, > "_children" : true, > "jcr:predecessors" : { > "r151191c7601-0-3" : > "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]" > }, > "jcr:successors" : { > "r151191c7601-0-3" : "\"[0]:Reference\"" > }, > "_modCount" : NumberLong(1) > } > {noformat} > Last revision entry of cached node state of > {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}} > (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last > revs were propagated correctly up till root for r151191c7601-0-3). > As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the > loop of script at \[2]) revived the visibility of the node. > *Disclaimer*: _The issue was reported and investigated on a cluster backed by > mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any > which have
[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state
[ https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011786#comment-15011786 ] Vikas Saurabh commented on OAK-3653: [~mreutegg], can you please take up this issue and schedule accordingly? > Incorrect last revision of cached node state > > > Key: OAK-3653 > URL: https://issues.apache.org/jira/browse/OAK-3653 > Project: Jackrabbit Oak > Issue Type: Bug > Components: documentmk >Reporter: Vikas Saurabh > Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, oplog.json > > > While installing a package in one of the systems (Oak-1.0.16), we observed > broken workflow model. Upon further investigation, it was found that > {{baseVersion}} of the model was pointing to a node in version storage which > wasn't visible on the instance. This node was available in mongo and could > also be seen on a different instance of the cluster. > The node was created on the same instance where it wasn't visible (cluster id > - 3) > Further investigation showed that for some (yet unknown) reason, > {{lastRevision}} for cached node state of node's parent at revision where the > node was created was stale (an older revision) which must have led to invalid > children list and hence unavailable node. > Attaching a few files which were captured during the investigation: > * [^oplog.json] - oplog for +-10s for r151191c7601-0-3 > * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to > list node children cache entry for parent > * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to > traverse to invisible node > * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache > instead (using script at \[2]) > The node which wasn't visible in mongo: > {noformat} > db.nodes.findOne({_id: > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"}) > { > "_id" : > "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7", > "_deleted" : { > "r151191c7601-0-3" : "false" > }, > "_commitRoot" : { > "r151191c7601-0-3" : "0" > }, > "_lastRev" : { > "r0-0-3" : "r151191c7601-0-3" > }, > "jcr:primaryType" : { > "r151191c7601-0-3" : "\"nam:nt:version\"" > }, > "jcr:uuid" : { > "r151191c7601-0-3" : > "\"40234f1d-2435-4c6b-9962-20af22b1c948\"" > }, > "_modified" : NumberLong(1447825270), > "jcr:created" : { > "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\"" > }, > "_children" : true, > "jcr:predecessors" : { > "r151191c7601-0-3" : > "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]" > }, > "jcr:successors" : { > "r151191c7601-0-3" : "\"[0]:Reference\"" > }, > "_modCount" : NumberLong(1) > } > {noformat} > Last revision entry of cached node state of > {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}} > (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last > revs were propagated correctly up till root for r151191c7601-0-3). > As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the > loop of script at \[2]) revived the visibility of the node. > *Disclaimer*: _The issue was reported and investigated on a cluster backed by > mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any > which have been fixed in later versions_ > \[0]: {noformat} > def printNodeChildrenCache(def path) { > def session = > osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null) > try { > def rootNode = session.getRootNode() > def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache > def cacheMap = cache.asMap() > cacheMap.each{k,v -> > if (k.toString().startsWith(path + "@")) { > println "${k}:${v}" > } > } > } finally { > session.logout() > } > } > > printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e") > {noformat} > \[1]: {noformat} > def traverse(def path) { > def session = > osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null) > try { > def rootNode = session.getRootNode() > def nb = rootNode.sessionDelegate.root.store.root > def p = '' > path.tokenize('/').each() { > p = p + '/' + it > nb = nb.getChildNode(it) > println "${p} ${nb}" > } > } finally { > session.logout() > } > } > traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7") > {n