[ https://issues.apache.org/jira/browse/KAFKA-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sandeep p updated KAFKA-10590: ------------------------------ Description: Whole cluster hungs when one of the three node goes down. To bring the cluster back all three nodes needs to be restarted. [2020-10-08 19:40:13,607] WARN Client session timed out, have not heard from server in 12002ms for sessionid 0x2acefe00000 (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,608] INFO Client session timed out, have not heard from server in 12002ms for sessionid 0x2acefe00000, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,866] INFO Opening socket connection to server 10.0.14.7/10.0.14.7:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,867] INFO Socket error occurred: 10.0.14.7/10.0.14.7:2181: Connection refused (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:14,093] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Connection to node 1 (/10.0.2.5:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2020-10-08 19:40:14,093] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error sending fetch request (sessionId=205463854, epoch=INITIAL) to node 1: {}. (org.apache.kafka.clients.FetchSessionHandler) java.io.IOException: Connection to 10.0.2.5:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:103) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:206) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:300) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:135) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:134) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) was: Whole cluster hungs when one of the three node goes down. To bring the cluster back all three nodes needs to be restarted. [2020-10-08 19:40:13,607] WARN Client session timed out, have not heard from server in 12002ms for sessionid 0x2acefe00000 (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,608] INFO Client session timed out, have not heard from server in 12002ms for sessionid 0x2acefe00000, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,866] INFO Opening socket connection to server 10.0.14.7/10.0.14.7:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,867] INFO Socket error occurred: 10.0.14.7/10.0.14.7:2181: Connection refused (org.apache.zookeeper.ClientCnxn) [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2020-10-08 19:40:14,093] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Connection to node 1 (/10.0.2.5:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2020-10-08 19:40:14,093] INFO [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error sending fetch request (sessionId=205463854, epoch=INITIAL) to node 1: {}. (org.apache.kafka.clients.FetchSessionHandler) java.io.IOException: Connection to 10.0.2.5:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:103) at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:206) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:300) at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:135) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:134) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) Summary: Whole kafka cluster going down when zookeeper leader and kafka goes down. (was: Whole kafka node hungs when one node goes down.) > Whole kafka cluster going down when zookeeper leader and kafka goes down. > ------------------------------------------------------------------------- > > Key: KAFKA-10590 > URL: https://issues.apache.org/jira/browse/KAFKA-10590 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, zkclient > Affects Versions: 2.5.0 > Environment: Ubuntu 16.04 > Reporter: sandeep p > Priority: Major > > Whole cluster hungs when one of the three node goes down. To bring the > cluster back all three nodes needs to be restarted. > > [2020-10-08 19:40:13,607] WARN Client session timed out, have not heard from > server in 12002ms for sessionid 0x2acefe00000 > (org.apache.zookeeper.ClientCnxn) > [2020-10-08 19:40:13,608] INFO Client session timed out, have not heard from > server in 12002ms for sessionid 0x2acefe00000, closing socket connection and > attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until > connected. (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. > (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Waiting until > connected. (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:13,709] INFO [ZooKeeperClient Kafka server] Connected. > (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:13,866] INFO Opening socket connection to server > 10.0.14.7/10.0.14.7:2181. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2020-10-08 19:40:13,867] INFO Socket error occurred: > 10.0.14.7/10.0.14.7:2181: Connection refused (org.apache.zookeeper.ClientCnxn) > [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until > connected. (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:13,968] INFO [ZooKeeperClient Kafka server] Waiting until > connected. (kafka.zookeeper.ZooKeeperClient) > [2020-10-08 19:40:14,093] WARN [ReplicaFetcher replicaId=0, leaderId=1, > fetcherId=0] Connection to node 1 (/10.0.2.5:9092) could not be established. > Broker may not be available. (org.apache.kafka.clients.NetworkClient) > [2020-10-08 19:40:14,093] INFO [ReplicaFetcher replicaId=0, leaderId=1, > fetcherId=0] Error sending fetch request (sessionId=205463854, epoch=INITIAL) > to node 1: {}. (org.apache.kafka.clients.FetchSessionHandler) > java.io.IOException: Connection to 10.0.2.5:9092 (id: 1 rack: null) failed. > at > org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:103) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:206) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:300) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:135) > at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:134) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) -- This message was sent by Atlassian Jira (v8.3.4#803005)