Hey Dimuthu -

We are actually in the process of preparing a new release, and this will
come with the previously mentioned bug fixes in Task Framework. It also
contains various ZK-related fixes - I don't know what your deployment
schedule is but it might be worth the wait of another week or so.

Hunter

On Fri, May 31, 2019 at 10:27 AM DImuthu Upeksha <[email protected]>
wrote:

> Now I'm seeing following error in controller log. Restarting the controller
> fixed the issue. We are time to time seeing this in controller with zk
> connection issues. Is this also something to do with zk client version?
>
> 2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0ebbee1d000e for server
> localhost/127.0.0.1:2181, unexpected error, closing socket connection and
> attempting reconnect
> java.io.IOException: Broken pipe
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102)
> at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha <
> [email protected]>
> wrote:
>
> > Hi Lei,
> >
> > We use 0.8.2. We initially had 0.8.4 but it contains an issue with task
> > retry logic so we downgraded to 0.8.2. We are planning to go into
> > production with 0.8.2 by next week so can you please advice a better way
> to
> > solve this without upgrading to 0.8.4.
> >
> > Thanks
> > Dimuthu
> >
> > On Fri, May 31, 2019 at 1:04 PM Lei Xia <[email protected]> wrote:
> >
> >> Which Helix version do you use?  This may caused by this Zookeeper bug (
> >> https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
> >> ZkClient in later Helix versions.
> >>
> >>
> >> Lei
> >>
> >> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
> >> [email protected]> wrote:
> >>
> >>> Hi Folks,
> >>>
> >>> I'm getting following error in controller log and seems like controller
> >>> is
> >>> not moving froward after that point
> >>>
> >>> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
> >>> Starting helix controller
> >>> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
> >>> Settings loaded from
> >>>
> >>>
> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
> >>> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> >>> Connection to helix cluster : AiravataDemoCluster with name :
> >>> helixcontroller2
> >>> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
> >>> Zookeeper connection string localhost:2181
> >>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
> >>> o.a.h.c.GenericHelixController  - Exception while executing
> >>> DEFAULTpipeline:
> >>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
> >>> cluster .AiravataDemoCluster. Will not continue to next pipeline
> >>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
> >>> get
> >>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
> >>> at
> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
> >>> at
> >>>
> >>>
> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
> >>> Caused by:
> org.apache.helix.api.exceptions.HelixMetaDataAccessException:
> >>> Fail to read nodes for
> >>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
> >>> at
> >>>
> >>>
> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
> >>> ... 7 common frames omitted
> >>>
> >>> In the zookeeper log I can see following warning getting printed
> >>> continuously. What could be the reason for that? I'm using helix 0.8.2
> >>> and
> >>> zookeeper 3.4.8
> >>>
> >>> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
> >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection
> for
> >>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f0000
> >>> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
> >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
> >>> connection
> >>> from /127.0.0.1:57984
> >>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
> >>> session 0x16b0e59877f0000 at /127.0.0.1:57984
> >>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
> >>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
> >>> 0x16b0e59877f0000 with negotiated timeout 30000 for client /
> >>> 127.0.0.1:57984
> >>> 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
> >>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream
> exception
> >>> EndOfStreamException: Unable to read additional data from client
> >>> sessionid
> >>> 0x16b0e59877f0000, likely client has closed socket
> >>> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
> >>> at
> >>>
> >>>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
> >>> at java.lang.Thread.run(Thread.java:748)
> >>>
> >>> Thanks
> >>> Dimuthu
> >>>
> >>
> >>
> >> --
> >> Lei Xia
> >>
> >
>

Reply via email to