Hi Helix team,
We observed an issue at state machine transition handle:
// statemodel.java:
public void offlineToSlave(Message message, NotificationContext context) {
// do work to start a local shard
// we want to save the new shard info to resource config
ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient);
try {
zklock.lock(); // ==> will be blocked here
ZNRecord record = zkclient.readData(scope.getZkPath(), true);
update record fields;
zkclient.writeData(scope.getZkPath(), record);
} finally {
zklock.unlock();
}
}
After several invocation of this method, zklock.lock() method doesn't
return (so the lock is not acquired). State machine threads become
blocked.
At zk path "<cluster>/LOCKS/RESOURCE_resource" I see several znodes
there, representing outstanding lock requests.
Are there any special care we should be aware of about zk lock ? Thanks.
-neutronsharc