Re: calling ZKHelixLock from state machine transition

Neutron sharc Sat, 14 May 2016 17:21:34 -0700

We increased the max connections allowed per client at zk server side.
The problem is gone now.


On Tue, May 10, 2016 at 2:50 PM, Neutron sharc <[email protected]> wrote:
> Hi Kanak,  thanks for reply.
>
> The problem is gone if we set a constraint of 1 on "STATE_TRANSITION"
> for the resource.  If we allow multiple state transitions to be
> executed in the resource,  then this zklock problem occurs.
>
> btw,  we run multiple participants in a same jvm in our test.  In
> other words, there are multiple java threads in a same jvm competing
> for zklock.
>
> We haven't profiled the ZKHelixLock._listener.lockAcquired() since we
> bypassed this problem using constraint.  Will revisit it later.
>
>
>
>
> On Mon, May 9, 2016 at 8:28 PM, Kanak Biscuitwala <[email protected]> wrote:
>> Hi,
>>
>> ZkHelixLock is a thin wrapper around the ZooKeeper WriteLock recipe (which 
>> was last changed over 5 years ago). Though we haven't extensively tested it 
>> in production, but we haven't seen it fail to return as described.
>>
>> Do you know if ZKHelixLock._listener.lockAcquired() is ever called?
>>
>> Feel free to examine the code here: 
>> https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/lock/zk/ZKHelixLock.java
>>
>>> From: [email protected]
>>> Date: Mon, 9 May 2016 14:26:43 -0700
>>> Subject: calling ZKHelixLock from state machine transition
>>> To: [email protected]
>>>
>>> Hi Helix team,
>>>
>>> We observed an issue at state machine transition handle:
>>>
>>> // statemodel.java:
>>>
>>> public void offlineToSlave(Message message, NotificationContext context) {
>>>
>>>   // do work to start a local shard
>>>
>>>   // we want to save the new shard info to resource config
>>>
>>>
>>>   ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient);
>>>   try {
>>>     zklock.lock();    // ==> will be blocked here
>>>
>>>     ZNRecord record = zkclient.readData(scope.getZkPath(), true);
>>>     update record fields;
>>>     zkclient.writeData(scope.getZkPath(), record);
>>>   } finally {
>>>     zklock.unlock();
>>>   }
>>> }
>>>
>>> After several invocation of this method,  zklock.lock() method doesn't
>>> return (so the lock is not acquired).  State machine threads become
>>> blocked.
>>>
>>> At zk path "<cluster>/LOCKS/RESOURCE_resource"  I see several znodes
>>> there, representing outstanding lock requests.
>>>
>>> Are there any special care we should be aware of about zk lock ?  Thanks.
>>>
>>>
>>> -neutronsharc
>>

Re: calling ZKHelixLock from state machine transition

Reply via email to