Ken Huang created CURATOR-678:
---------------------------------
Summary: InterProcessMutex#release caused inconsistency between zk
node and local cache if encountering zk connection lost
Key: CURATOR-678
URL: https://issues.apache.org/jira/browse/CURATOR-678
Project: Apache Curator
Issue Type: Bug
Reporter: Ken Huang
Assignee: Enrico Olivelli
We experienced a problem that
an InterProcessMutex participant acquired the lock -> when release() was
running, it encountered zk connection lost, then there was inconsistency as in
codes
[https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/InterProcessMutex.java#L139]
to line 143, that the zk node deletion threw exception for connection lost, but
the local cached `threadData` still removed it.
As a result, even when the zk connection recovered later, ALL following
acquire() failed due to the inconsistency (not present in local `threadData`
but the OLD zk node were still present).
Please help confirm this behavior. I think it is bug and curator should fix the
inconsistency, a suggestion is to remove the local data ONLY after znode
deletion is a success. Also, the same problematic code seems appearing in many
other similar recipes such as `InterProcessSemaphore`.
Stacktrace:
```
Failed to release mutex for xxxxxxxxxxxxx
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for
/xxxx/_c_65fb02ef-9b1d-4c8c-b715-5c97f82ae0d3-lock-0000000000 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
~[zookeeper-3.6.3.jar:3.6.3] at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
~[zookeeper-3.6.3.jar:3.6.3] at
org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:2001)
~[zookeeper-3.6.3.jar:3.6.3] at
org.apache.curator.framework.imps.DeleteBuilderImpl$6.call(DeleteBuilderImpl.java:313)
~[curator-framework-5.3.0.jar:5.3.0] at
org.apache.curator.framework.imps.DeleteBuilderImpl$6.call(DeleteBuilderImpl.java:301)
~[curator-framework-5.3.0.jar:5.3.0] at
org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
~[curator-client-5.3.0.jar:?] at
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:298)
~[curator-framework-5.3.0.jar:5.3.0] at
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:282)
~[curator-framework-5.3.0.jar:5.3.0] at
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35)
~[curator-framework-5.3.0.jar:5.3.0] at
org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
~[curator-recipes-5.3.0.jar:5.3.0] at
org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
~[curator-recipes-5.3.0.jar:5.3.0] at
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
~[curator-recipes-5.3.0.jar:5.3.0] at
... ...
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)