[
https://issues.apache.org/jira/browse/KAFKA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anthony Lazam resolved KAFKA-7876.
----------------------------------
Resolution: Duplicate
> Broker suddenly got disconnected
> ---------------------------------
>
> Key: KAFKA-7876
> URL: https://issues.apache.org/jira/browse/KAFKA-7876
> Project: Kafka
> Issue Type: Bug
> Components: controller, network
> Affects Versions: 2.1.0
> Reporter: Anthony Lazam
> Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: kafka-issue.png
>
>
>
> We have 3 node cluster setup. There are scenarios that one of the broker
> suddenly got disconnected from the cluster but no underlying system issue is
> found. The node that got dc'ed wasn't able to release the partition it holds
> as the leader, hence clients (spring-boot) was unable to send/receive data
> from the issued broker.
> We noticed that it always happen to the active controller count.
> Environment details:
> Provider: AWS
> Kernel: 3.10.0-693.21.1.el7.x86_64
> OS: CentOS Linux release 7.5.1804 (Core)
> Scala version: 2.11
> Kafka version: 2.1.0
> Kafka config:
> {code:java}
> ############################# Socket Server Settings
> #############################
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> ############################# Log Basics #############################
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> ############################# Internal Topic Settings
> #############################
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3
> transaction.state.log.min.isr=2
> ############################# Log Retention Policy
> #############################
> log.retention.hours=168
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000
> ############################# Group Coordinator Settings
> #############################
> group.initial.rebalance.delay.ms=0
> ############################# Zookeeper #############################
> zookeeper.connection.timeout.ms=6000
> broker.id=1
> zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
> log.dirs=/data/kafka-node
> advertised.listeners=PLAINTEXT://node1:9092
> {code}
> Broker disconnected controller log:
> {code:java}
> [2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger
> auto leader balancing (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by
> broker Map(TOPICS->MAP) (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
> replica for broker 2 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
> broker 2 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
> replica for broker 1 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
> broker 1 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred
> replica for broker 3 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for
> broker 3 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger
> auto leader balancing (kafka.controller.KafkaController)
> {code}
> Broker working server.log:
> {code:java}
> [2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2,
> fetcherId=0] Error sending fetch request (sessionId=1637095899,
> epoch=21379644) to node 2: java.io.IOException: Connection to 2 was
> disconnected before the response was read.
> (org.apache.kafka.clients.FetchSessionHandler)
> [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2,
> fetcherId=0] Error in response for fetch request (type=FetchRequest,
> replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760,
> fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356,
> maxBytes=1048576, currentLeaderEpoch=Optional[2])},
> isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio
> nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 2 was disconnected before the response was
> read
> at
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
> at
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
> at
> kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
> at
> kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
> at
> kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
> at
> kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
> at scala.Option.foreach(Option.scala:257)
> at
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-01-26 05:02:35,723] WARN Attempting to send response via channel for
> which there is no open connection, connection id node3:9092-node2:59988-1550
> (kafka.network.Processor)
> [2019-01-26 05:02:35,731] WARN Attempting to send response via channel for
> which there is no open connection, connection id node3:9092-node2:59986-1550
> (kafka.network.Processor)
> [2019-01-26 05:02:35,797] WARN Attempting to send response via channel for
> which there is no open connection, connection id node3:9092-node2:59494-1549
> (kafka.network.Processor)
> [2019-01-26 05:02:35,816] WARN Attempting to send response via channel for
> which there is no open connection, connection id node3:9092-node2:53268-1530
> (kafka.network.Processor)
> [2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2,
> fetcherId=0] Error sending fetch request (sessionId=1637095899,
> epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was
> disconnected before the response was read.
> (org.apache.kafka.clients.FetchSessionHandler)
> {code}
>
> The request handler idle metrics dropped during this issue:
> !kafka-issue.png!
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)