Hi Dimuthu, Did you stop the controller when the connection is flapping or when it is normal? Could you please list all the steps that you have done in order?
Best Regards, Jiajun On Sat, Mar 30, 2019 at 5:54 AM DImuthu Upeksha <[email protected]> wrote: > Hi Folks, > > In helix controller, we have seen below log line and by looking at the > code, I understood that it is due to ZkHelixManager is failing to connect > to zookeeper for 5 times. So I tried to stop the controller and in the stop > logic, we have a call to ZkHelixManager.disconnect() method and it hangs. I > got a thread dump and you can see where it is waiting. Can you please > advice as better approach to solve this? > > I noticed that ZkHelixManager disconnects [1] it self when a flapping is > detected. Is calling disconnect() twice the reason for that? > > 2019-03-29 15:19:56,832 [ > ZkClient-EventThread-14-api.staging.scigap.org:2181] > ERROR o.a.h.m.zk.ZKHelixManager - instanceName: helixcontroller is > flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in > 300000ms. > > Thread-5 - priority:5 - threadId:0x00007f5c740023f0 - nativeId:0x63f1 - > nativeId (decimal):25585 - state:BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > > org.apache.helix.manager.zk.ZKHelixManager.resetHandlers(ZKHelixManager.java:903) > - waiting to lock <0x00000006c7e08110> (a > org.apache.helix.manager.zk.ZKHelixManager) > at > > org.apache.helix.manager.zk.ZKHelixManager.disconnect(ZKHelixManager.java:693) > at > > org.apache.airavata.helix.impl.controller.HelixController.disconnect(HelixController.java:103) > at > > org.apache.airavata.helix.impl.controller.HelixController$$Lambda$2/846492085.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - None > > [1] > > https://github.com/apache/helix/blob/helix-0.8.2/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java#L991 > Thanks > Dimuthu >
