[ https://issues.apache.org/jira/browse/ZOOKEEPER-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schmitz updated ZOOKEEPER-3822: ----------------------------------------- Attachment: kafka.log > Zookeeper 3.6.1 EndOfStreamException > ------------------------------------ > > Key: ZOOKEEPER-3822 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3822 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.6.1 > Reporter: Sebastian Schmitz > Priority: Critical > Attachments: kafka.log, zookeeper.log > > > Hello, > after Zookeeper 3.6.1 solved the issue with leader-election containing the IP > and so causing it to fail in separate networks, like in our docker-setup I > updated from 3.4.14 to 3.6.1 in Dev- and Test-Environments. It all went > smoothly and ran for one day. This night I had a new Update of the > environment as we deploy as a whole package of all containers (Kafka, > Zookeeper, Mirrormaker etc.) we also replace the Zookeeper-Containers with > latest ones. In this case, there was no change, the containers were just > removed and deployed again. As the config and data of zookeeper is not stored > inside the containers that's not a problem but this night it broke the whole > clusters of Zookeeper and so also Kafka was down. > * zookeeper_node_1 was stopped and the container removed and created again > * zookeeper_node_1 starts up and the election takes place > * zookeeper_node_2 is elected as leader again > * zookeeper_node_2 is stopped and the container removed and created again > * zookeeper_node_3 is elected as the leader while zookeeper_node_2 is down > * zookeeper_node_2 starts up and zookeeper_node_3 remains leader > And from there all servers just report > 2020-05-07 14:07:57,187 [myid:3] - WARN > [NIOWorkerThread-2:NIOServerCnxn@364] - Unexpected exception2020-05-07 > 14:07:57,187 [myid:3] - WARN [NIOWorkerThread-2:NIOServerCnxn@364] - > Unexpected exceptionEndOfStreamException: Unable to read additional data from > client, it probably closed the socket: address = /z.z.z.z:46060, session = > 0x2014386bbde0000 at > org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163) > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326) at > org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522) > at > org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.base/java.lang.Thread.run(Unknown Source) > and don't recover. > I was able to recover the cluster in Test-Environment by stopping and > starting all the zookeeper-nodes. The cluster in dev is still in that state > and I'm checking the logs to find out more... > The full log of the deployment that started at 02:00 is attached. The first > time in local NZ-time and the second one is UTC. the IPs I replaced are > x.x.x.x for node_1, y.y.y.y for node_2 and z.z.z.z for node_3 > The Kafka-Servers are running on the same machine. Which means that the > EndOfStreamEceptions could also be connections from Kafka as I don't think > that zookeeper_node_3 establish a session with itself? > -- This message was sent by Atlassian Jira (v8.3.4#803005)