[ https://issues.apache.org/jira/browse/HDFS-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412675#comment-13412675 ]
Rakesh R commented on HDFS-3477: -------------------------------- Added links to HDFS-3635 as I feel the cause is same and failing after timeout: {code}java.lang.Exception: test timed out after 30000 milliseconds at java.lang.Object.wait(Native Method) at org.apache.hadoop.ha.ZKFailoverController.waitForActiveAttempt(ZKFailoverController.java:457) at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:645) at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:58) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:590) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:587){code} > FormatZK and ZKFC startup can fail due to zkclient connection establishment > delay > --------------------------------------------------------------------------------- > > Key: HDFS-3477 > URL: https://issues.apache.org/jira/browse/HDFS-3477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover > Affects Versions: 2.0.1-alpha > Reporter: suja s > Assignee: Rakesh R > Attachments: HDFS-3477.1.patch, HDFS-3477.2.patch, HDFS-3477.3.patch, > HDFS-3477.3.patch, HDFS-3477.patch > > > Format and ZKFC startup flows continue further after creation of zkclient > connection without waiting to check whether the connection is completely > established. This leads to failure at the subsequent point if connection was > not complete by then. > Exception trace for format > {noformat} > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Socket connection established to > HOST-xx-xx-xx-55/xx.xx.xx.55:2182, initiating session > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: Session establishment complete > on server HOST-xx-xx-xx-55/xx.xx.xx.55:2182, sessionid = 0x1379da4660c0014, > negotiated timeout = 5000 > 12/05/30 19:48:24 WARN ha.ActiveStandbyElector: Ignoring stale result from > old client with sessionId 0x1379da4660c0014 > 12/05/30 19:48:24 INFO zookeeper.ZooKeeper: Session: 0x1379da4660c0014 closed > 12/05/30 19:48:24 INFO zookeeper.ClientCnxn: EventThread shut down > Exception in thread "main" java.io.IOException: Couldn't determine existence > of znode '/hadoop-ha/hacluster' > at > org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:263) > at > org.apache.hadoop.ha.ZKFailoverController.formatZK(ZKFailoverController.java:257) > at > org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:195) > at > org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:58) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:163) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:159) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:159) > at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:171) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hadoop-ha/hacluster > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1049) > at > org.apache.hadoop.ha.ActiveStandbyElector.parentZNodeExists(ActiveStandbyElector.java:261) > ... 8 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira