Hey Dimuthu - We are actually in the process of preparing a new release, and this will come with the previously mentioned bug fixes in Task Framework. It also contains various ZK-related fixes - I don't know what your deployment schedule is but it might be worth the wait of another week or so.
Hunter On Fri, May 31, 2019 at 10:27 AM DImuthu Upeksha <[email protected]> wrote: > Now I'm seeing following error in controller log. Restarting the controller > fixed the issue. We are time to time seeing this in controller with zk > connection issues. Is this also something to do with zk client version? > > 2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN > o.apache.zookeeper.ClientCnxn - Session 0x16b0ebbee1d000e for server > localhost/127.0.0.1:2181, unexpected error, closing socket connection and > attempting reconnect > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102) > at > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) > > Thanks > Dimuthu > > On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha < > [email protected]> > wrote: > > > Hi Lei, > > > > We use 0.8.2. We initially had 0.8.4 but it contains an issue with task > > retry logic so we downgraded to 0.8.2. We are planning to go into > > production with 0.8.2 by next week so can you please advice a better way > to > > solve this without upgrading to 0.8.4. > > > > Thanks > > Dimuthu > > > > On Fri, May 31, 2019 at 1:04 PM Lei Xia <[email protected]> wrote: > > > >> Which Helix version do you use? This may caused by this Zookeeper bug ( > >> https://issues.apache.org/jira/browse/ZOOKEEPER-706). We have upgraded > >> ZkClient in later Helix versions. > >> > >> > >> Lei > >> > >> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < > >> [email protected]> wrote: > >> > >>> Hi Folks, > >>> > >>> I'm getting following error in controller log and seems like controller > >>> is > >>> not moving froward after that point > >>> > >>> 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - > >>> Starting helix controller > >>> 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - > >>> Settings loaded from > >>> > >>> > file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties > >>> 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - > >>> Connection to helix cluster : AiravataDemoCluster with name : > >>> helixcontroller2 > >>> 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - > >>> Zookeeper connection string localhost:2181 > >>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR > >>> o.a.h.c.GenericHelixController - Exception while executing > >>> DEFAULTpipeline: > >>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for > >>> cluster .AiravataDemoCluster. Will not continue to next pipeline > >>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to > >>> get > >>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) > >>> at > >>> > >>> > org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) > >>> at > >>> > >>> > org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) > >>> at > org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) > >>> at > >>> > >>> > org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) > >>> at > >>> > >>> > org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) > >>> Caused by: > org.apache.helix.api.exceptions.HelixMetaDataAccessException: > >>> Fail to read nodes for > >>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) > >>> at > >>> > >>> > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) > >>> ... 7 common frames omitted > >>> > >>> In the zookeeper log I can see following warning getting printed > >>> continuously. What could be the reason for that? I'm using helix 0.8.2 > >>> and > >>> zookeeper 3.4.8 > >>> > >>> 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: > >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection > for > >>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f0000 > >>> 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: > >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket > >>> connection > >>> from /127.0.0.1:57984 > >>> 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew > >>> session 0x16b0e59877f0000 at /127.0.0.1:57984 > >>> 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session > >>> 0x16b0e59877f0000 with negotiated timeout 30000 for client / > >>> 127.0.0.1:57984 > >>> 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: > >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream > exception > >>> EndOfStreamException: Unable to read additional data from client > >>> sessionid > >>> 0x16b0e59877f0000, likely client has closed socket > >>> at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > >>> at > >>> > >>> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > >>> at java.lang.Thread.run(Thread.java:748) > >>> > >>> Thanks > >>> Dimuthu > >>> > >> > >> > >> -- > >> Lei Xia > >> > > >
