[ https://issues.apache.org/jira/browse/KAFKA-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765007#comment-16765007 ]
Christoffer Hammarström commented on KAFKA-7913: ------------------------------------------------ This is bug KAFKA-7697 > Kafka broker halts and messes up the whole cluster > -------------------------------------------------- > > Key: KAFKA-7913 > URL: https://issues.apache.org/jira/browse/KAFKA-7913 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.1.0 > Environment: kafka_2.12-2.1.0, > openjdk version "11.0.1" 2018-10-16 LTS > OpenJDK Runtime Environment 18.9 (build 11.0.1+13-LTS), > CentOS Linux release 7.3.1611 (Core), > linux 3.10.0-514.26.2.el7.x86_64 > Reporter: Andrej Urvantsev > Priority: Major > > We upgraded cluster recently and running kafka 2.1.0 on java 11. > For a time being everything went ok, but then random brokers started to halt > from time to time. > When it happens the broker still looks alive to other brokers, but it stops > to receive network traffic. Other brokers then throw IOException: > {noformat} > java.io.IOException: Connection to 36155 was disconnected before the response > was read > at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:241) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:130) > at > kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:129) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > {noformat} > On the problematic broker all logging stops. No errors, no exceptions - > nothing. > This also "breaks" all cluster - since clients and other brokers "think" that > broker is still alive, > they are trying to connect to it and it seems that leader election leaves > problematic brokers as a leader. > > I would be glad to provide any further details if somebody could give an > advice what to investigate when it happens next time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)