[ 
https://issues.apache.org/jira/browse/OAK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Saurabh updated OAK-3653:
-------------------------------
    Description: 
While installing a package in one of the systems (Oak-1.0.16), we observed 
broken workflow model. Upon further investigation, it was found that 
{{baseVersion}} of the model was pointing to a node in version storage which 
wasn't visible on the instance. This node was available in mongo and could also 
be seen on a different instance of the cluster. 

The node was created on the same instance where it wasn't visible (cluster id - 
3)

Further investigation showed that for some (yet unknown) reason, 
{{lastRevision}} for cached node state of node's parent at revision where the 
node was created was stale (an older revision) which must have led to invalid 
children list and hence unavailable node.

Attaching a few files which were captured during the investigation:
* [^oplog.json] - oplog for +-10s for r151191c7601-0-3
* [^Capture.JPG]- snapshot of groovy console output using script at \[0] to 
list node children cache entry for parent
* [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to 
traverse to invisible node
* [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache 
instead (using script at \[2])

The node which wasn't visible in mongo:
{noformat}
db.nodes.findOne({_id: 
"7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
{
        "_id" : 
"7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
        "_deleted" : {
                "r151191c7601-0-3" : "false"
        },
        "_commitRoot" : {
                "r151191c7601-0-3" : "0"
        },
        "_lastRev" : {
                "r0-0-3" : "r151191c7601-0-3"
        },
        "jcr:primaryType" : {
                "r151191c7601-0-3" : "\"nam:nt:version\""
        },
        "jcr:uuid" : {
                "r151191c7601-0-3" : "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
        },
        "_modified" : NumberLong(1447825270),
        "jcr:created" : {
                "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
        },
        "_children" : true,
        "jcr:predecessors" : {
                "r151191c7601-0-3" : 
"[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
        },
        "jcr:successors" : {
                "r151191c7601-0-3" : "\"[0]:Reference\""
        },
        "_modCount" : NumberLong(1)
}
{noformat}

Last revision entry of cached node state of 
{{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
 (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
revs were propagated correctly up till root for r151191c7601-0-3).

As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the loop 
of script at \[2]) revived the visibility of the node.

*Disclaimer*: _The issue was reported and investigated on a cluster backed by 
mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
which have been fixed in later versions_

\[0]: {noformat}
def printNodeChildrenCache(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
    def cacheMap = cache.asMap()
    cacheMap.each{k,v -> 
      if (k.toString().startsWith(path + "@")) {
        println "${k}:${v}"
      }
    }
  } finally {
    session.logout()
  }
}
 
printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
{noformat}
\[1]: {noformat}
def traverse(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def nb = rootNode.sessionDelegate.root.store.root
    def p = ''
    path.tokenize('/').each() {
      p = p + '/' + it
      nb = nb.getChildNode(it)
      println "${p} ${nb}"
    }
  } finally {
    session.logout()
  }
}
traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7")
{noformat}

\[2]: {noformat}
def printNodeChildrenCache(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def cache = rootNode.sessionDelegate.root.store.nodeCache
    def cacheMap = cache.asMap()
    cacheMap.each{k,v -> 
      if (k.toString().startsWith(path + "@")) {
        println "${k}:${v}"
        println "lastRev -> ${v.lastRevision}"
      }
    }
  } finally {
    session.logout()
  }
}
 
printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
{noformat}

  was:
While installing a package in one of the systems (Oak-1.0.16), we observed 
broken workflow model. Upon further investigation, it was found that 
{{baseVersion}} of the model was pointing to a node in version storage which 
wasn't visible on the instance. This node was available in mongo and could also 
be seen on a different instance of the cluster. 

The node was created on the same instance where it wasn't visible (cluster id - 
3)

Further investigation showed that for some (yet unknown) reason, 
{{lastRevision}} for cached node state of node's parent at revision where the 
node was created was stale (an older revision) which must have led to invalid 
children list and hence unavailable node.

Attaching a few files which were captured during the investigation:
* oplog.json - oplog for +-10s for r151191c7601-0-3
* Capture.jpg - snapshot of groovy console output using script at \[0] to list 
node children cache entry for parent
* Capture1.jpg - snapshot of groovy console output using script at \[1] to 
traverse to invisible node
* Capture2.jpg - Almost same as Capture.jpg but this time for node cache 
instead (using script at \[2])

The node which wasn't visible in mongo:
{noformat}
db.nodes.findOne({_id: 
"7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
{
        "_id" : 
"7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
        "_deleted" : {
                "r151191c7601-0-3" : "false"
        },
        "_commitRoot" : {
                "r151191c7601-0-3" : "0"
        },
        "_lastRev" : {
                "r0-0-3" : "r151191c7601-0-3"
        },
        "jcr:primaryType" : {
                "r151191c7601-0-3" : "\"nam:nt:version\""
        },
        "jcr:uuid" : {
                "r151191c7601-0-3" : "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
        },
        "_modified" : NumberLong(1447825270),
        "jcr:created" : {
                "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
        },
        "_children" : true,
        "jcr:predecessors" : {
                "r151191c7601-0-3" : 
"[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
        },
        "jcr:successors" : {
                "r151191c7601-0-3" : "\"[0]:Reference\""
        },
        "_modCount" : NumberLong(1)
}
{noformat}

Last revision entry of cached node state of 
{{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
 (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
revs were propagated correctly up till root for r151191c7601-0-3).

As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the loop 
of script at \[2]) revived the visibility of the node.

*Disclaimer*: _The issue was reported and investigated on a cluster backed by 
mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
which have been fixed in later versions_

\[0]: {noformat}
def printNodeChildrenCache(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
    def cacheMap = cache.asMap()
    cacheMap.each{k,v -> 
      if (k.toString().startsWith(path + "@")) {
        println "${k}:${v}"
      }
    }
  } finally {
    session.logout()
  }
}
 
printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
{noformat}
\[1]: {noformat}
def traverse(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def nb = rootNode.sessionDelegate.root.store.root
    def p = ''
    path.tokenize('/').each() {
      p = p + '/' + it
      nb = nb.getChildNode(it)
      println "${p} ${nb}"
    }
  } finally {
    session.logout()
  }
}
traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7")
{noformat}

\[2]: {noformat}
def printNodeChildrenCache(def path) {
  def session = 
osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
  try {
    def rootNode = session.getRootNode()
    def cache = rootNode.sessionDelegate.root.store.nodeCache
    def cacheMap = cache.asMap()
    cacheMap.each{k,v -> 
      if (k.toString().startsWith(path + "@")) {
        println "${k}:${v}"
        println "lastRev -> ${v.lastRevision}"
      }
    }
  } finally {
    session.logout()
  }
}
 
printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
{noformat}


> Incorrect last revision of cached node state
> --------------------------------------------
>
>                 Key: OAK-3653
>                 URL: https://issues.apache.org/jira/browse/OAK-3653
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: documentmk
>            Reporter: Vikas Saurabh
>         Attachments: Capture.JPG, Capture1.JPG, Capture2.JPG, oplog.json
>
>
> While installing a package in one of the systems (Oak-1.0.16), we observed 
> broken workflow model. Upon further investigation, it was found that 
> {{baseVersion}} of the model was pointing to a node in version storage which 
> wasn't visible on the instance. This node was available in mongo and could 
> also be seen on a different instance of the cluster. 
> The node was created on the same instance where it wasn't visible (cluster id 
> - 3)
> Further investigation showed that for some (yet unknown) reason, 
> {{lastRevision}} for cached node state of node's parent at revision where the 
> node was created was stale (an older revision) which must have led to invalid 
> children list and hence unavailable node.
> Attaching a few files which were captured during the investigation:
> * [^oplog.json] - oplog for +-10s for r151191c7601-0-3
> * [^Capture.JPG]- snapshot of groovy console output using script at \[0] to 
> list node children cache entry for parent
> * [^Capture1.JPG]- snapshot of groovy console output using script at \[1] to 
> traverse to invisible node
> * [^Capture2.JPG]- Almost same as Capture.jpg but this time for node cache 
> instead (using script at \[2])
> The node which wasn't visible in mongo:
> {noformat}
> db.nodes.findOne({_id: 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7"})
> {
>         "_id" : 
> "7:/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7",
>         "_deleted" : {
>                 "r151191c7601-0-3" : "false"
>         },
>         "_commitRoot" : {
>                 "r151191c7601-0-3" : "0"
>         },
>         "_lastRev" : {
>                 "r0-0-3" : "r151191c7601-0-3"
>         },
>         "jcr:primaryType" : {
>                 "r151191c7601-0-3" : "\"nam:nt:version\""
>         },
>         "jcr:uuid" : {
>                 "r151191c7601-0-3" : 
> "\"40234f1d-2435-4c6b-9962-20af22b1c948\""
>         },
>         "_modified" : NumberLong(1447825270),
>         "jcr:created" : {
>                 "r151191c7601-0-3" : "\"dat:2015-11-17T21:41:14.058-08:00\""
>         },
>         "_children" : true,
>         "jcr:predecessors" : {
>                 "r151191c7601-0-3" : 
> "[\"ref:6cecc77b-3020-4a47-b9cc-084a618aa957\"]"
>         },
>         "jcr:successors" : {
>                 "r151191c7601-0-3" : "\"[0]:Reference\""
>         },
>         "_modCount" : NumberLong(1)
> }
> {noformat}
> Last revision entry of cached node state of 
> {{/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e}}
>  (parent) at revision {{r151191c7601-0-3}} was {{r14f035ebdf8-0-3}} (the last 
> revs were propagated correctly up till root for r151191c7601-0-3).
> As a work-around, doing {{cache.invalidate(k)}} for node cache (inside the 
> loop of script at \[2]) revived the visibility of the node.
> *Disclaimer*: _The issue was reported and investigated on a cluster backed by 
> mongo running on Oak-1.0.16. But, the problem itself doesn't seem like any 
> which have been fixed in later versions_
> \[0]: {noformat}
> def printNodeChildrenCache(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
>     def rootNode = session.getRootNode()
>     def cache = rootNode.sessionDelegate.root.store.nodeChildrenCache
>     def cacheMap = cache.asMap()
>     cacheMap.each{k,v -> 
>       if (k.toString().startsWith(path + "@")) {
>         println "${k}:${v}"
>       }
>     }
>   } finally {
>     session.logout()
>   }
> }
>  
> printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
> {noformat}
> \[1]: {noformat}
> def traverse(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
>     def rootNode = session.getRootNode()
>     def nb = rootNode.sessionDelegate.root.store.root
>     def p = ''
>     path.tokenize('/').each() {
>       p = p + '/' + it
>       nb = nb.getChildNode(it)
>       println "${p} ${nb}"
>     }
>   } finally {
>     session.logout()
>   }
> }
> traverse("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e/1.7")
> {noformat}
> \[2]: {noformat}
> def printNodeChildrenCache(def path) {
>   def session = 
> osgi.getService(org.apache.sling.jcr.api.SlingRepository.class).loginAdministrative(null)
>   try {
>     def rootNode = session.getRootNode()
>     def cache = rootNode.sessionDelegate.root.store.nodeCache
>     def cacheMap = cache.asMap()
>     cacheMap.each{k,v -> 
>       if (k.toString().startsWith(path + "@")) {
>         println "${k}:${v}"
>         println "lastRev -> ${v.lastRevision}"
>       }
>     }
>   } finally {
>     session.logout()
>   }
> }
>  
> printNodeChildrenCache("/jcr:system/jcr:versionStorage/d9/9f/22/d99f2279-dcc9-40a0-9aa4-c91fe310287e")
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to