We increased the max connections allowed per client at zk server side.
The problem is gone now.

On Tue, May 10, 2016 at 2:50 PM, Neutron sharc <neutronsh...@gmail.com> wrote:
> Hi Kanak,  thanks for reply.
>
> The problem is gone if we set a constraint of 1 on "STATE_TRANSITION"
> for the resource.  If we allow multiple state transitions to be
> executed in the resource,  then this zklock problem occurs.
>
> btw,  we run multiple participants in a same jvm in our test.  In
> other words, there are multiple java threads in a same jvm competing
> for zklock.
>
> We haven't profiled the ZKHelixLock._listener.lockAcquired() since we
> bypassed this problem using constraint.  Will revisit it later.
>
>
>
>
> On Mon, May 9, 2016 at 8:28 PM, Kanak Biscuitwala <kana...@hotmail.com> wrote:
>> Hi,
>>
>> ZkHelixLock is a thin wrapper around the ZooKeeper WriteLock recipe (which 
>> was last changed over 5 years ago). Though we haven't extensively tested it 
>> in production, but we haven't seen it fail to return as described.
>>
>> Do you know if ZKHelixLock._listener.lockAcquired() is ever called?
>>
>> Feel free to examine the code here: 
>> https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/lock/zk/ZKHelixLock.java
>>
>>> From: neutronsh...@gmail.com
>>> Date: Mon, 9 May 2016 14:26:43 -0700
>>> Subject: calling ZKHelixLock from state machine transition
>>> To: dev@helix.apache.org
>>>
>>> Hi Helix team,
>>>
>>> We observed an issue at state machine transition handle:
>>>
>>> // statemodel.java:
>>>
>>> public void offlineToSlave(Message message, NotificationContext context) {
>>>
>>>   // do work to start a local shard
>>>
>>>   // we want to save the new shard info to resource config
>>>
>>>
>>>   ZKHelixLock zklock = new ZKHelixLock(clusterId, resource, zkclient);
>>>   try {
>>>     zklock.lock();    // ==> will be blocked here
>>>
>>>     ZNRecord record = zkclient.readData(scope.getZkPath(), true);
>>>     update record fields;
>>>     zkclient.writeData(scope.getZkPath(), record);
>>>   } finally {
>>>     zklock.unlock();
>>>   }
>>> }
>>>
>>> After several invocation of this method,  zklock.lock() method doesn't
>>> return (so the lock is not acquired).  State machine threads become
>>> blocked.
>>>
>>> At zk path "<cluster>/LOCKS/RESOURCE_resource"  I see several znodes
>>> there, representing outstanding lock requests.
>>>
>>> Are there any special care we should be aware of about zk lock ?  Thanks.
>>>
>>>
>>> -neutronsharc
>>

Reply via email to