[
https://issues.apache.org/jira/browse/KAFKA-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623077#comment-14623077
]
Parth Brahmbhatt commented on KAFKA-2182:
-----------------------------------------
[~junrao] I think I took care of this as part of KAFKA-2169. We now just
system.exit when this exception is caught at least on the borker side. Can we
close this jira as Fixed? Or am I missing the intent of this jira?
> zkClient dies if there is any exception while reconnecting
> ----------------------------------------------------------
>
> Key: KAFKA-2182
> URL: https://issues.apache.org/jira/browse/KAFKA-2182
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.1
> Reporter: Igor Maravić
> Assignee: Parth Brahmbhatt
> Priority: Critical
>
> We, Spotify, have just been hit by a BUG that's related to ZkClient. It made
> a whole Kafka cluster go down.
> Long story short, we've restarted TOR switch and all of our brokers from the
> cluster lost all the connectivity with the zookeeper cluster, which was
> living in another rack. Because of that, all the connections to Zookeeper got
> expired.
> Everything would be fine if we haven't lost hostname to IP Address mapping
> for some reason. Since we did lost that mapping, we got a
> UnknownHostException when the broker tried to reconnect. This exception got
> swallowed up, and we never got reconnected to Zookeeper, which in turn made
> our cluster of brokers useless.
> If we had "handleSessionEstablishmentError" function, the whole exception
> could be caught, we could just completely kill KafkaServer process and start
> it cleanly, since this kind of exception is fatal for the KafkaCluster.
> {code}
> 2015-05-05T12:49:01.709+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO
> zookeeper.ZooKeeper - Initiating client connection,
> connectString=zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local
> sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@7303d690
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 apache-kafka[main-EventThread] ERROR
> zookeeper.ClientCnxn - Error while calling watcher
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 java.lang.RuntimeException: Exception
> while restarting zk client
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 Caused by:
> org.I0Itec.zkclient.exception.ZkException: Unable to connect to
> zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:939)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458)
> 2015-05-05T12:49:01.711+00:00 127.0.0.1 ... 3 more
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 Caused by:
> java.net.UnknownHostException: zookeeper1.spotify.net: Name or service not
> known
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.InetAddress.getAllByName0(InetAddress.java:1246)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.InetAddress.getAllByName(InetAddress.java:1162)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> java.net.InetAddress.getAllByName(InetAddress.java:1098)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
> 2015-05-05T12:49:01.712+00:00 127.0.0.1 at
> org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 ... 5 more
> 2015-05-05T12:49:01.713+00:00 127.0.0.1
> apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local]
> ERROR zkclient.ZkEventThread - Error handling event ZkEvent[Children of
> /config/changes changed sent to
> kafka.server.TopicConfigManager$ConfigChangeListener$@17638f6]
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.713+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 apache-kafka[main-EventThread] INFO
> zookeeper.ClientCnxn - EventThread shut down
> 2015-05-05T12:49:01.714+00:00 127.0.0.1
> apache-kafka[ZkClient-EventThread-18-zookeeper1.spotify.net:2181,zookeeper2.spotify.net:2181,zookeeper3.spotify.net:2181/gabobroker-local]
> ERROR zkclient.ZkEventThread - Error handling event ZkEvent[Data of
> /controller changed sent to
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener@18360394]
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 java.lang.NullPointerException
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:439)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$3.call(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:436)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:544)
> 2015-05-05T12:49:01.714+00:00 127.0.0.1 at
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)