[ https://issues.apache.org/jira/browse/ZOOKEEPER-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695436#comment-16695436 ]
Michael K. Edwards commented on ZOOKEEPER-3145: ----------------------------------------------- Fix needed for 3.5.5? > Potential watch missing issue due to stale pzxid when replaying CloseSession > txn with fuzzy snapshot > ---------------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-3145 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3145 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.4, 3.6.0, 3.4.13 > Reporter: Fangmin Lv > Assignee: Fangmin Lv > Priority: Critical > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This is another issue I found recently, we haven't seen this problem on prod > (or maybe we don't notice). > > Currently, the CloseSession is not idempotent, executing the CloseSession > twice won't get the same result. > > The problem is that closeSession will only check what's the ephemeral nodes > associated with that session bases on current states. Nodes deleted during > taking fuzzy snapshot won't be deleted again when replay the txn. > > This looks fine, since it's already gone, but there is problem with the pzxid > of the parent node. Snapshot is taken fuzzily, so it's possible that the > parent had been serialized while the nodes are being deleted when executing > the closeSession Txn. The pzxid will not be updated in the snapshot when > replaying the closeSession txn, because doesn't know what's the paths being > deleted, so it won't patch the pzxid like what we did in the deleteNode > ZOOKEEPER-3125. > > The inconsistent pzxid will lead to potential watch notification missing when > client reconnect with setWatches because of the staleness. > > This JIRA is going to fix those issues by adding the CloseSessionTxn, it will > record all those nodes being deleted in that CloseSession txn, so that we > know which nodes to update when replaying the txn. -- This message was sent by Atlassian JIRA (v7.6.3#76005)