Re: calling ZKHelixLock from state machine transition

Neutron sharc Tue, 10 May 2016 14:51:41 -0700

Hi Kanak,  thanks for reply.

The problem is gone if we set a constraint of 1 on "STATE_TRANSITION"
for the resource.  If we allow multiple state transitions to be
executed in the resource,  then this zklock problem occurs.


btw,  we run multiple participants in a same jvm in our test.  In
other words, there are multiple java threads in a same jvm competing
for zklock.

We haven't profiled the ZKHelixLock._listener.lockAcquired() since we
bypassed this problem using constraint.  Will revisit it later.




On Mon, May 9, 2016 at 8:28 PM, Kanak Biscuitwala <[email protected]> wrote:
> Hi,
>
> ZkHelixLock is a thin wrapper around the ZooKeeper WriteLock recipe (which 
> was last changed over 5 years ago). Though we haven't extensively tested it 
> in production, but we haven't seen it fail to return as described.
>
> Do you know if ZKHelixLock._listener.lockAcquired() is ever called?
>
> Feel free to examine the code here: 
> https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/lock/zk/ZKHelixLock.java
>
>> From: [email protected]
>> Date: Mon, 9 May 2016 14:26:43 -0700
>> Subject: calling ZKHelixLock from state machine transition
>> To: [email protected]
>>
>> Hi Helix team,
>>
>> We observed an issue at state machine transition handle:
>>
>> // statemodel.java:
>>
>> public void offlineToSlave(Message message, NotificationContext context) {
>>
>>   // do work to start a local shard
>>
>>   // we want to save the new shard info to resource config
>>
>>
>>   ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient);
>>   try {
>>     zklock.lock();    // ==> will be blocked here
>>
>>     ZNRecord record = zkclient.readData(scope.getZkPath(), true);
>>     update record fields;
>>     zkclient.writeData(scope.getZkPath(), record);
>>   } finally {
>>     zklock.unlock();
>>   }
>> }
>>
>> After several invocation of this method,  zklock.lock() method doesn't
>> return (so the lock is not acquired).  State machine threads become
>> blocked.
>>
>> At zk path "<cluster>/LOCKS/RESOURCE_resource"  I see several znodes
>> there, representing outstanding lock requests.
>>
>> Are there any special care we should be aware of about zk lock ?  Thanks.
>>
>>
>> -neutronsharc
>

Re: calling ZKHelixLock from state machine transition

Reply via email to