Hi Folks, In helix controller, we have seen below log line and by looking at the code, I understood that it is due to ZkHelixManager is failing to connect to zookeeper for 5 times. So I tried to stop the controller and in the stop logic, we have a call to ZkHelixManager.disconnect() method and it hangs. I got a thread dump and you can see where it is waiting. Can you please advice as better approach to solve this?
I noticed that ZkHelixManager disconnects [1] it self when a flapping is detected. Is calling disconnect() twice the reason for that? 2019-03-29 15:19:56,832 [ZkClient-EventThread-14-api.staging.scigap.org:2181] ERROR o.a.h.m.zk.ZKHelixManager - instanceName: helixcontroller is flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in 300000ms. Thread-5 - priority:5 - threadId:0x00007f5c740023f0 - nativeId:0x63f1 - nativeId (decimal):25585 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.helix.manager.zk.ZKHelixManager.resetHandlers(ZKHelixManager.java:903) - waiting to lock <0x00000006c7e08110> (a org.apache.helix.manager.zk.ZKHelixManager) at org.apache.helix.manager.zk.ZKHelixManager.disconnect(ZKHelixManager.java:693) at org.apache.airavata.helix.impl.controller.HelixController.disconnect(HelixController.java:103) at org.apache.airavata.helix.impl.controller.HelixController$$Lambda$2/846492085.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None [1] https://github.com/apache/helix/blob/helix-0.8.2/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java#L991 Thanks Dimuthu
