[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590037#comment-16590037 ] Oded commented on ZOOKEEPER-3036: - Hope it will help , before we were getting the split brain, we saw this log: kafka-cluster-zookeeper-1 zookeeper 2018-08-22 20:04:46,623 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x26561ad5c49000f type:ping cxid:0xfffe zxid:0xfffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved kafka-cluster-zookeeper-1 zookeeper 2018-08-22 20:05:50,885 [myid:2] - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x16561ad5c490007 type:delete cxid:0xabc zxid:0x10cbe txntype:-1 reqpath:n/a Error Path:/config/changes/config_change_01 Error:KeeperErrorCode = NoNode for /config/changes/config_change_01 > Unexpected exception in zookeeper > - > > Key: ZOOKEEPER-3036 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Affects Versions: 3.4.10 > Environment: 3 Zookeepers, 5 kafka servers >Reporter: Oded >Priority: Critical > > We got an issue with one of the zookeeprs (Leader), causing the entire kafka > cluster to fail: > 2018-05-09 02:29:01,730 [myid:3] - ERROR > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559) > 2018-05-09 02:29:01,730 [myid:3] - WARN > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE > /192.168.0.91:42490 > > We would expect that zookeeper will choose another Leader and the Kafka > cluster will continue to work as expected, but that was not the case. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560215#comment-16560215 ] Oded commented on ZOOKEEPER-3036: - We are running with kafka 1.1.0 > Unexpected exception in zookeeper > - > > Key: ZOOKEEPER-3036 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036 > Project: ZooKeeper > Issue Type: Bug > Components: jmx >Affects Versions: 3.4.10 > Environment: 3 Zookeepers, 5 kafka servers >Reporter: Oded >Priority: Critical > > We got an issue with one of the zookeeprs (Leader), causing the entire kafka > cluster to fail: > 2018-05-09 02:29:01,730 [myid:3] - ERROR > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559) > 2018-05-09 02:29:01,730 [myid:3] - WARN > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE > /192.168.0.91:42490 > > We would expect that zookeeper will choose another Leader and the Kafka > cluster will continue to work as expected, but that was not the case. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559328#comment-16559328 ] Oded commented on ZOOKEEPER-3036: - Hi, We did the same but the issue returned again. It happens to us from time to time, and we handle it manually. The main problem is that it caused kafka for "split brain" where the cluster believes it has more then one controller. > Unexpected exception in zookeeper > - > > Key: ZOOKEEPER-3036 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036 > Project: ZooKeeper > Issue Type: Bug > Components: jmx >Affects Versions: 3.4.10 > Environment: 3 Zookeepers, 5 kafka servers >Reporter: Oded >Priority: Critical > > We got an issue with one of the zookeeprs (Leader), causing the entire kafka > cluster to fail: > 2018-05-09 02:29:01,730 [myid:3] - ERROR > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected > exception causing shutdown while sock still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559) > 2018-05-09 02:29:01,730 [myid:3] - WARN > [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE > /192.168.0.91:42490 > > We would expect that zookeeper will choose another Leader and the Kafka > cluster will continue to work as expected, but that was not the case. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3036) Unexpected exception in zookeeper
Oded created ZOOKEEPER-3036: --- Summary: Unexpected exception in zookeeper Key: ZOOKEEPER-3036 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036 Project: ZooKeeper Issue Type: Bug Components: jmx Affects Versions: 3.4.10 Environment: 3 Zookeepers, 5 kafka servers Reporter: Oded We got an issue with one of the zookeeprs (Leader), causing the entire kafka cluster to fail: 2018-05-09 02:29:01,730 [myid:3] - ERROR [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559) 2018-05-09 02:29:01,730 [myid:3] - WARN [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE /192.168.0.91:42490 We would expect that zookeeper will choose another Leader and the Kafka cluster will continue to work as expected, but that was not the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)