[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state

2016-02-09 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139262#comment-15139262
 ] 

Vikas Saurabh commented on OAK-3653:


bq. Depending issue OAK-3634 is fixed. Resolving this issue as well.
Updating fixVersion to include 1.2.11 according to OAK-3634

> Incorrect last revision of cached node state
> 
>
> Key: OAK-3653
> URL: https://issues.apache.org/jira/browse/OAK-3653
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Vikas Saurabh
>Assignee: Marcel Reutegger
> Fix For: 1.3.14, 1.2.11
>
> Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, diagram.png, 
> oplog.json
>
>
> While installing a package in one of the systems (Oak-1.0.16), we observed 
> broken workflow model. Upon further investigation, it was found that 
> {{baseVersion}} of the model was pointing to a node in version storage which 
> wasn't visible on the instance. This node was available in mongo and could 
> also be seen on a different instance of the cluster. 
> The node was created on the same instance where it wasn't visible (cluster id 
> - 3)
> Further investigation showed that for some (yet unknown) reason, 
> {{lastRevision}} for cached node state of node's parent at revision where the 
> node was created was stale (an older revision) which must have led to invalid 
> children list and hence unavailable node.
> Attaching a few files which were captured during the investigation:
> * [^oplog.json] - oplog for +-10s for r151191c7601-0-3
> * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to 
> list node children cache entry for parent
> * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to 
> traverse to invisible node
> * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache 
> instead (using script at \[2])
> The node which wasn't visible in mongo:
> {noformat}
> db.nodes.findOne({_id: 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
> {
> "_id" : 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
> "_deleted" : {
> "r151191c7601-0-3" : "false"
> },
> "_commitRoot" : {
> "r151191c7601-0-3" : "0"
> },
> "_lastRev" : {
> "r0-0-3" : "r151191c7601-0-3"
> },
> "jcr:primaryType" : {
> "r151191c7601-0-3" : "\"nam:nt:version\""
> },
> "jcr:uuid" : {
> "r151191c7601-0-3" : 
> "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
> },
> "_modified" : NumberLong(1447825270),
> "jcr:created" : {
> "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
> },
> "_children" : true,
> "jcr:predecessors" : {
> "r151191c7601-0-3" : 
> "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
> },
> "jcr:successors" : {
> "r151191c7601-0-3" : "\"[0]:Reference\""
> },
> "_modCount" : NumberLong(1)
> }
> {noformat}
> Last revision entry of cached node state of 
> {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
>  (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
> revs were propagated correctly up till root for r151191c7601-0-3).
> As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the 
> loop of script at \[2]) revived the visibility of the node.
> *Disclaimer*: _The issue was reported and investigated on a cluster backed by 
> mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
> which have been fixed in later versions_
> \[0]: {noformat}
> def printNodeChildrenCache(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
> def rootNode = session.getRootNode()
> def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
> def cacheMap = cache.asMap()
> cacheMap.each{k,v -> 
>   if (k.toString().startsWith(path + "@")) {
> println "${k}:${v}"
>   }
> }
>   } finally {
> session.logout()
>   }
> }
>  
> printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
> {noformat}
> \[1]: {noformat}
> def traverse(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
> def rootNode = session.getRootNode()
> def nb = rootNode.sessionDelegate.root.store.root
> def p = ''
> path.tokenize('/').each() {
>   p = p + '/' + it
>   nb = nb.getChildNode(it)
>   println "${p} ${nb}"
>   

[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state

2015-12-17 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061988#comment-15061988
 ] 

Marcel Reutegger commented on OAK-3653:
---

I saw a similar issue while running tests for OAK-3646 and I think this is 
caused by OAK-3634. The attached diagram shows a possible sequence of 
operations in a two-node cluster with background reads and updates on a node 
'foo'.

!diagram.png|width=100%!

First cluster node 1 updates the _lastRev of foo to r4-0-1 with a modCount of 
5. Next, cluster node 2 updates its _lastRev to r7-0-2 bumping the modCount to 
6 and then runs a background update. Cluster node 1 picks up the changes of the 
root document with a background read and starts a query to find out which 
documents to invalidate. The query result will contain an entry for 'foo' with 
a modCount of 6. At the same time a background update happens on cluster node 1 
and the _lastRev is set to r9-0-1. On the MongoDB side the modCount is set to 
7, but because of OAK-3634 the new cached document in Oak is calculated based 
on the currently cache document, which has a modCount of 5. The new document in 
the cache will therefore have a modCount of 6. The background read now proceeds 
to the invalidate phase and compares the modCount from the query with the 
modCount in the cache. Both are 6 and the document is not invalidated. 

> Incorrect last revision of cached node state
> 
>
> Key: OAK-3653
> URL: https://issues.apache.org/jira/browse/OAK-3653
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Vikas Saurabh
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, diagram.png, 
> oplog.json
>
>
> While installing a package in one of the systems (Oak-1.0.16), we observed 
> broken workflow model. Upon further investigation, it was found that 
> {{baseVersion}} of the model was pointing to a node in version storage which 
> wasn't visible on the instance. This node was available in mongo and could 
> also be seen on a different instance of the cluster. 
> The node was created on the same instance where it wasn't visible (cluster id 
> - 3)
> Further investigation showed that for some (yet unknown) reason, 
> {{lastRevision}} for cached node state of node's parent at revision where the 
> node was created was stale (an older revision) which must have led to invalid 
> children list and hence unavailable node.
> Attaching a few files which were captured during the investigation:
> * [^oplog.json] - oplog for +-10s for r151191c7601-0-3
> * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to 
> list node children cache entry for parent
> * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to 
> traverse to invisible node
> * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache 
> instead (using script at \[2])
> The node which wasn't visible in mongo:
> {noformat}
> db.nodes.findOne({_id: 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
> {
> "_id" : 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
> "_deleted" : {
> "r151191c7601-0-3" : "false"
> },
> "_commitRoot" : {
> "r151191c7601-0-3" : "0"
> },
> "_lastRev" : {
> "r0-0-3" : "r151191c7601-0-3"
> },
> "jcr:primaryType" : {
> "r151191c7601-0-3" : "\"nam:nt:version\""
> },
> "jcr:uuid" : {
> "r151191c7601-0-3" : 
> "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
> },
> "_modified" : NumberLong(1447825270),
> "jcr:created" : {
> "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
> },
> "_children" : true,
> "jcr:predecessors" : {
> "r151191c7601-0-3" : 
> "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
> },
> "jcr:successors" : {
> "r151191c7601-0-3" : "\"[0]:Reference\""
> },
> "_modCount" : NumberLong(1)
> }
> {noformat}
> Last revision entry of cached node state of 
> {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
>  (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
> revs were propagated correctly up till root for r151191c7601-0-3).
> As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the 
> loop of script at \[2]) revived the visibility of the node.
> *Disclaimer*: _The issue was reported and investigated on a cluster backed by 
> mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
> which have

[jira] [Commented] (OAK-3653) Incorrect last revision of cached node state

2015-11-18 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011786#comment-15011786
 ] 

Vikas Saurabh commented on OAK-3653:


[~mreutegg], can you please take up this issue and schedule accordingly?

> Incorrect last revision of cached node state
> 
>
> Key: OAK-3653
> URL: https://issues.apache.org/jira/browse/OAK-3653
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: documentmk
>Reporter: Vikas Saurabh
> Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, oplog.json
>
>
> While installing a package in one of the systems (Oak-1.0.16), we observed 
> broken workflow model. Upon further investigation, it was found that 
> {{baseVersion}} of the model was pointing to a node in version storage which 
> wasn't visible on the instance. This node was available in mongo and could 
> also be seen on a different instance of the cluster. 
> The node was created on the same instance where it wasn't visible (cluster id 
> - 3)
> Further investigation showed that for some (yet unknown) reason, 
> {{lastRevision}} for cached node state of node's parent at revision where the 
> node was created was stale (an older revision) which must have led to invalid 
> children list and hence unavailable node.
> Attaching a few files which were captured during the investigation:
> * [^oplog.json] - oplog for +-10s for r151191c7601-0-3
> * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to 
> list node children cache entry for parent
> * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to 
> traverse to invisible node
> * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache 
> instead (using script at \[2])
> The node which wasn't visible in mongo:
> {noformat}
> db.nodes.findOne({_id: 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
> {
> "_id" : 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
> "_deleted" : {
> "r151191c7601-0-3" : "false"
> },
> "_commitRoot" : {
> "r151191c7601-0-3" : "0"
> },
> "_lastRev" : {
> "r0-0-3" : "r151191c7601-0-3"
> },
> "jcr:primaryType" : {
> "r151191c7601-0-3" : "\"nam:nt:version\""
> },
> "jcr:uuid" : {
> "r151191c7601-0-3" : 
> "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
> },
> "_modified" : NumberLong(1447825270),
> "jcr:created" : {
> "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
> },
> "_children" : true,
> "jcr:predecessors" : {
> "r151191c7601-0-3" : 
> "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
> },
> "jcr:successors" : {
> "r151191c7601-0-3" : "\"[0]:Reference\""
> },
> "_modCount" : NumberLong(1)
> }
> {noformat}
> Last revision entry of cached node state of 
> {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
>  (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
> revs were propagated correctly up till root for r151191c7601-0-3).
> As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the 
> loop of script at \[2]) revived the visibility of the node.
> *Disclaimer*: _The issue was reported and investigated on a cluster backed by 
> mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
> which have been fixed in later versions_
> \[0]: {noformat}
> def printNodeChildrenCache(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
> def rootNode = session.getRootNode()
> def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
> def cacheMap = cache.asMap()
> cacheMap.each{k,v -> 
>   if (k.toString().startsWith(path + "@")) {
> println "${k}:${v}"
>   }
> }
>   } finally {
> session.logout()
>   }
> }
>  
> printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
> {noformat}
> \[1]: {noformat}
> def traverse(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
> def rootNode = session.getRootNode()
> def nb = rootNode.sessionDelegate.root.store.root
> def p = ''
> path.tokenize('/').each() {
>   p = p + '/' + it
>   nb = nb.getChildNode(it)
>   println "${p} ${nb}"
> }
>   } finally {
> session.logout()
>   }
> }
> traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7")
> {n