Hi Helix team,

We observed an issue at state machine transition handle:

// statemodel.java:

public void offlineToSlave(Message message, NotificationContext context) {

  // do work to start a local shard

  // we want to save the new shard info to resource config


  ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient);
  try {
    zklock.lock();    // ==> will be blocked here

    ZNRecord record = zkclient.readData(scope.getZkPath(), true);
    update record fields;
    zkclient.writeData(scope.getZkPath(), record);
  } finally {
    zklock.unlock();
  }
}

After several invocation of this method,  zklock.lock() method doesn't
return (so the lock is not acquired).  State machine threads become
blocked.

At zk path "<cluster>/LOCKS/RESOURCE_resource"  I see several znodes
there, representing outstanding lock requests.

Are there any special care we should be aware of about zk lock ?  Thanks.


-neutronsharc

Reply via email to