[ https://issues.apache.org/jira/browse/KAFKA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax closed KAFKA-7876. ---------------------------------- > Broker suddenly got disconnected > --------------------------------- > > Key: KAFKA-7876 > URL: https://issues.apache.org/jira/browse/KAFKA-7876 > Project: Kafka > Issue Type: Bug > Components: controller, network > Affects Versions: 2.1.0 > Reporter: Anthony Lazam > Priority: Critical > Fix For: 2.2.0 > > Attachments: kafka-issue.png > > > > We have 3 node cluster setup. There are scenarios that one of the broker > suddenly got disconnected from the cluster but no underlying system issue is > found. The node that got dc'ed wasn't able to release the partition it holds > as the leader, hence clients (spring-boot) was unable to send/receive data > from the issued broker. > We noticed that it always happen to the active controller count. > Environment details: > Provider: AWS > Kernel: 3.10.0-693.21.1.el7.x86_64 > OS: CentOS Linux release 7.5.1804 (Core) > Scala version: 2.11 > Kafka version: 2.1.0 > Kafka config: > {code:java} > ############################# Socket Server Settings > ############################# > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > ############################# Log Basics ############################# > num.partitions=1 > num.recovery.threads.per.data.dir=1 > ############################# Internal Topic Settings > ############################# > offsets.topic.replication.factor=3 > transaction.state.log.replication.factor=3 > transaction.state.log.min.isr=2 > ############################# Log Retention Policy > ############################# > log.retention.hours=168 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=300000 > ############################# Group Coordinator Settings > ############################# > group.initial.rebalance.delay.ms=0 > ############################# Zookeeper ############################# > zookeeper.connection.timeout.ms=6000 > broker.id=1 > zookeeper.connect=zk1:2181,zk2:2181,zk3:2181 > log.dirs=/data/kafka-node > advertised.listeners=PLAINTEXT://node1:9092 > {code} > Broker disconnected controller log: > {code:java} > [2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger > auto leader balancing (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by > broker Map(TOPICS->MAP) (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 2 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 2 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 1 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 1 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred > replica for broker 3 Map() (kafka.controller.KafkaController) > [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for > broker 3 is 0.0 (kafka.controller.KafkaController) > [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger > auto leader balancing (kafka.controller.KafkaController) > {code} > Broker working server.log: > {code:java} > [2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2, > fetcherId=0] Error sending fetch request (sessionId=1637095899, > epoch=21379644) to node 2: java.io.IOException: Connection to 2 was > disconnected before the response was read. > (org.apache.kafka.clients.FetchSessionHandler) > [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2, > fetcherId=0] Error in response for fetch request (type=FetchRequest, > replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760, > fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356, > maxBytes=1048576, currentLeaderEpoch=Optional[2])}, > isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio > nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread) > java.io.IOException: Connection to 2 was disconnected before the response was > read > at > org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97) > at > kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97) > at > kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190) > at > kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241) > at > kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130) > at > kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129) > at > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > [2019-01-26 05:02:35,723] WARN Attempting to send response via channel for > which there is no open connection, connection id node3:9092-node2:59988-1550 > (kafka.network.Processor) > [2019-01-26 05:02:35,731] WARN Attempting to send response via channel for > which there is no open connection, connection id node3:9092-node2:59986-1550 > (kafka.network.Processor) > [2019-01-26 05:02:35,797] WARN Attempting to send response via channel for > which there is no open connection, connection id node3:9092-node2:59494-1549 > (kafka.network.Processor) > [2019-01-26 05:02:35,816] WARN Attempting to send response via channel for > which there is no open connection, connection id node3:9092-node2:53268-1530 > (kafka.network.Processor) > [2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2, > fetcherId=0] Error sending fetch request (sessionId=1637095899, > epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was > disconnected before the response was read. > (org.apache.kafka.clients.FetchSessionHandler) > {code} > > The request handler idle metrics dropped during this issue: > !kafka-issue.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)