saichand created KAFKA-5778:
-------------------------------
Summary: Kafka cluster is not responding when one broker hangs and
resulted in too many connections in close_wait in other brokers
Key: KAFKA-5778
URL: https://issues.apache.org/jira/browse/KAFKA-5778
Project: Kafka
Issue Type: Bug
Affects Versions: 0.10.0.1
Reporter: saichand
Priority: Blocker
In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from
then other two brokers has connections in close_wait for java client
producer/consumer and also even some broker to broker connections are in close
wait among those two brokers.
Kafka Version : 0.10.0.1
In logs I found replica fetcher thread connection refused exceptions:
In broker 0 : replica fetcher 0-1, replica fetcher 0-2
In broker 2 : replica fetcher 0-0, replica fetcher 0-1
In broker 1 : It was hung no logs were available at that time.
We tried restarting broker- 2 kafka and then it was not successful as it
terminated saying zookeeper timeout
then we tried restarting broker- 0 kafka and we got the same error
Broker -1 was hang so , we could not login even into it
so we restarted broker -1 machine
then we restarted all zookepers and then kafka brokers now everything is fine
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)