Hi, ZkHelixLock is a thin wrapper around the ZooKeeper WriteLock recipe (which was last changed over 5 years ago). Though we haven't extensively tested it in production, but we haven't seen it fail to return as described.
Do you know if ZKHelixLock._listener.lockAcquired() is ever called? Feel free to examine the code here: https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/lock/zk/ZKHelixLock.java > From: [email protected] > Date: Mon, 9 May 2016 14:26:43 -0700 > Subject: calling ZKHelixLock from state machine transition > To: [email protected] > > Hi Helix team, > > We observed an issue at state machine transition handle: > > // statemodel.java: > > public void offlineToSlave(Message message, NotificationContext context) { > > // do work to start a local shard > > // we want to save the new shard info to resource config > > > ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient); > try { > zklock.lock(); // ==> will be blocked here > > ZNRecord record = zkclient.readData(scope.getZkPath(), true); > update record fields; > zkclient.writeData(scope.getZkPath(), record); > } finally { > zklock.unlock(); > } > } > > After several invocation of this method, zklock.lock() method doesn't > return (so the lock is not acquired). State machine threads become > blocked. > > At zk path "<cluster>/LOCKS/RESOURCE_resource" I see several znodes > there, representing outstanding lock requests. > > Are there any special care we should be aware of about zk lock ? Thanks. > > > -neutronsharc
