Hi Kanak, thanks for reply. The problem is gone if we set a constraint of 1 on "STATE_TRANSITION" for the resource. If we allow multiple state transitions to be executed in the resource, then this zklock problem occurs.
btw, we run multiple participants in a same jvm in our test. In other words, there are multiple java threads in a same jvm competing for zklock. We haven't profiled the ZKHelixLock._listener.lockAcquired() since we bypassed this problem using constraint. Will revisit it later. On Mon, May 9, 2016 at 8:28 PM, Kanak Biscuitwala <kana...@hotmail.com> wrote: > Hi, > > ZkHelixLock is a thin wrapper around the ZooKeeper WriteLock recipe (which > was last changed over 5 years ago). Though we haven't extensively tested it > in production, but we haven't seen it fail to return as described. > > Do you know if ZKHelixLock._listener.lockAcquired() is ever called? > > Feel free to examine the code here: > https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/lock/zk/ZKHelixLock.java > >> From: neutronsh...@gmail.com >> Date: Mon, 9 May 2016 14:26:43 -0700 >> Subject: calling ZKHelixLock from state machine transition >> To: dev@helix.apache.org >> >> Hi Helix team, >> >> We observed an issue at state machine transition handle: >> >> // statemodel.java: >> >> public void offlineToSlave(Message message, NotificationContext context) { >> >> // do work to start a local shard >> >> // we want to save the new shard info to resource config >> >> >> ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient); >> try { >> zklock.lock(); // ==> will be blocked here >> >> ZNRecord record = zkclient.readData(scope.getZkPath(), true); >> update record fields; >> zkclient.writeData(scope.getZkPath(), record); >> } finally { >> zklock.unlock(); >> } >> } >> >> After several invocation of this method, zklock.lock() method doesn't >> return (so the lock is not acquired). State machine threads become >> blocked. >> >> At zk path "<cluster>/LOCKS/RESOURCE_resource" I see several znodes >> there, representing outstanding lock requests. >> >> Are there any special care we should be aware of about zk lock ? Thanks. >> >> >> -neutronsharc >