Hi Helix team, We observed an issue at state machine transition handle:
// statemodel.java: public void offlineToSlave(Message message, NotificationContext context) { // do work to start a local shard // we want to save the new shard info to resource config ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient); try { zklock.lock(); // ==> will be blocked here ZNRecord record = zkclient.readData(scope.getZkPath(), true); update record fields; zkclient.writeData(scope.getZkPath(), record); } finally { zklock.unlock(); } } After several invocation of this method, zklock.lock() method doesn't return (so the lock is not acquired). State machine threads become blocked. At zk path "<cluster>/LOCKS/RESOURCE_resource" I see several znodes there, representing outstanding lock requests. Are there any special care we should be aware of about zk lock ? Thanks. -neutronsharc