[ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560186#comment-16560186 ]
Kevin Lu commented on ZOOKEEPER-3036: ------------------------------------- [~o...@coralogix.com] yes multiple brokers think they are the controller. What version of Kafka are you using? We found this issue in 0.10.2.0, and upgrading to 1.1.1 seems to have fixed the problem. It is stable now, but not sure if it will happen again. > Unexpected exception in zookeeper > --------------------------------- > > Key: ZOOKEEPER-3036 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036 > Project: ZooKeeper > Issue Type: Bug > Components: jmx > Affects Versions: 3.4.10 > Environment: 3 Zookeepers, 5 kafka servers > Reporter: Oded > Priority: Critical > > We got an issue with one of the zookeeprs (Leader), causing the entire kafka > cluster to fail: > 2018-05-09 02:29:01,730 [myid:3] - ERROR > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559) > 2018-05-09 02:29:01,730 [myid:3] - WARN > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE > /192.168.0.91:42490 ******** > > We would expect that zookeeper will choose another Leader and the Kafka > cluster will continue to work as expected, but that was not the case. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)