This is a good catch. @Wang Jiajun <[email protected]> the stack trace is good enough to fix this right. We just have to look at all the paths we can get into this method and make sure resetHandler is thread safe and validates the state of the zkConnection and handlers.
On Mon, Apr 1, 2019 at 12:41 PM Wang Jiajun <[email protected]> wrote: > Hi Dimuthu, > > Did you stop the controller when the connection is flapping or when it is > normal? > Could you please list all the steps that you have done in order? > > Best Regards, > Jiajun > > > On Sat, Mar 30, 2019 at 5:54 AM DImuthu Upeksha < > [email protected]> > wrote: > > > Hi Folks, > > > > In helix controller, we have seen below log line and by looking at the > > code, I understood that it is due to ZkHelixManager is failing to connect > > to zookeeper for 5 times. So I tried to stop the controller and in the > stop > > logic, we have a call to ZkHelixManager.disconnect() method and it > hangs. I > > got a thread dump and you can see where it is waiting. Can you please > > advice as better approach to solve this? > > > > I noticed that ZkHelixManager disconnects [1] it self when a flapping is > > detected. Is calling disconnect() twice the reason for that? > > > > 2019-03-29 15:19:56,832 [ > > ZkClient-EventThread-14-api.staging.scigap.org:2181] > > ERROR o.a.h.m.zk.ZKHelixManager - instanceName: helixcontroller is > > flapping. disconnect it. maxDisconnectThreshold: 5 disconnects in > > 300000ms. > > > > Thread-5 - priority:5 - threadId:0x00007f5c740023f0 - nativeId:0x63f1 - > > nativeId (decimal):25585 - state:BLOCKED > > stackTrace: > > java.lang.Thread.State: BLOCKED (on object monitor) > > at > > > > > org.apache.helix.manager.zk.ZKHelixManager.resetHandlers(ZKHelixManager.java:903) > > - waiting to lock <0x00000006c7e08110> (a > > org.apache.helix.manager.zk.ZKHelixManager) > > at > > > > > org.apache.helix.manager.zk.ZKHelixManager.disconnect(ZKHelixManager.java:693) > > at > > > > > org.apache.airavata.helix.impl.controller.HelixController.disconnect(HelixController.java:103) > > at > > > > > org.apache.airavata.helix.impl.controller.HelixController$$Lambda$2/846492085.run(Unknown > > Source) > > at java.lang.Thread.run(Thread.java:748) > > Locked ownable synchronizers: > > - None > > > > [1] > > > > > https://github.com/apache/helix/blob/helix-0.8.2/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java#L991 > > Thanks > > Dimuthu > > >
