[ https://issues.apache.org/jira/browse/METRON-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Sirota updated METRON-261: -------------------------------- Priority: Minor (was: Major) > Storm Supervisors Fail to Start > ------------------------------- > > Key: METRON-261 > URL: https://issues.apache.org/jira/browse/METRON-261 > Project: Metron > Issue Type: Bug > Reporter: Nick Allen > Priority: Minor > Fix For: 0.2.1BETA > > > After deployment completes, the Storm Supervisors often fail to start > correctly. This prevents any data from being ingested until the Supervisors > are manually started. > It appears that the Supervisors fail to communicate with Zookeeper and they > timeout and die. Zookeeper may just not be ready in time. Not sure if this > is something we can fix or if this is an Ambari issue. > 2016-06-25 12:48:16.448 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server > null, unexpected error, closing socket connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > ~[?:1.8.0_40] > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > ~[?:1.8.0_40] > at > org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) > ~[storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > 2016-06-25 12:48:17.154 o.a.s.c.ConnectionState [ERROR] Connection timed out > for connection string (ec2-52-41-178-50.us-west-2.compute.amazonaws.com:2181) > and timeout (15000) / elapsed (15053) > org.apache.storm.curator.CuratorConnectionLossException: KeeperErrorCode = > ConnectionLoss > at > org.apache.storm.curator.ConnectionState.checkTimeouts(ConnectionState.java:195) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:487) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > backtype.storm.zookeeper$exists_node_QMARK_$fn__3211.invoke(zookeeper.clj:107) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:120) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > backtype.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:60) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at clojure.lang.RestFn.invoke(RestFn.java:486) [clojure-1.6.0.jar:?] > at > backtype.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:314) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at clojure.lang.RestFn.invoke(RestFn.java:439) [clojure-1.6.0.jar:?] > at > backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:296) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at > backtype.storm.daemon.supervisor$fn__8449$exec_fn__3614__auto____8450.invoke(supervisor.clj:504) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at clojure.lang.AFn.applyToHelper(AFn.java:160) [clojure-1.6.0.jar:?] > at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?] > at clojure.core$apply.invoke(core.clj:624) [clojure-1.6.0.jar:?] > at > backtype.storm.daemon.supervisor$fn__8449$mk_supervisor__8476.doInvoke(supervisor.clj:500) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.6.0.jar:?] > at > backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:792) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:822) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] > at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.6.0.jar:?] > at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.6.0.jar:?] > at backtype.storm.daemon.supervisor.main(Unknown Source) > [storm-core-0.10.0.2.3.4.7-4.jar:0.10.0.2.3.4.7-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332)