Hi Folks,

In helix controller, we have seen below log line and by looking at the
code, I understood that it is due to ZkHelixManager is failing to connect
to zookeeper for 5 times. So I tried to stop the controller and in the stop
logic, we have a call to ZkHelixManager.disconnect() method and it hangs. I
got a thread dump and you can see where it is waiting. Can you please
advice as better approach to solve this?

I noticed that ZkHelixManager disconnects [1] it self when a flapping is
detected. Is calling disconnect() twice the reason for that?

2019-03-29 15:19:56,832 [ZkClient-EventThread-14-api.staging.scigap.org:2181]
ERROR o.a.h.m.zk.ZKHelixManager  - instanceName: helixcontroller is
flapping. disconnect it.  maxDisconnectThreshold: 5 disconnects in 300000ms.

Thread-5 - priority:5 - threadId:0x00007f5c740023f0 - nativeId:0x63f1 -
nativeId (decimal):25585 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.helix.manager.zk.ZKHelixManager.resetHandlers(ZKHelixManager.java:903)
- waiting to lock <0x00000006c7e08110> (a
org.apache.helix.manager.zk.ZKHelixManager)
at
org.apache.helix.manager.zk.ZKHelixManager.disconnect(ZKHelixManager.java:693)
at
org.apache.airavata.helix.impl.controller.HelixController.disconnect(HelixController.java:103)
at
org.apache.airavata.helix.impl.controller.HelixController$$Lambda$2/846492085.run(Unknown
Source)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None

[1]
https://github.com/apache/helix/blob/helix-0.8.2/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java#L991
Thanks
Dimuthu

Reply via email to