[ 
https://issues.apache.org/jira/browse/KAFKA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Lazam updated KAFKA-7876:
---------------------------------
    Fix Version/s: 2.1.1
                   2.2.0

> Broker suddenly got disconnected 
> ---------------------------------
>
>                 Key: KAFKA-7876
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7876
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, network
>    Affects Versions: 2.1.0
>            Reporter: Anthony Lazam
>            Priority: Critical
>             Fix For: 2.2.0, 2.1.1
>
>         Attachments: kafka-issue.png
>
>
>  
> We have 3 node cluster setup. There are scenarios that one of the broker 
> suddenly got disconnected from the cluster but no underlying system issue is 
> found. The node that got dc'ed wasn't able to release the partition it holds 
> as the leader, hence clients (spring-boot) was unable to send/receive data 
> from the issued broker.
> We noticed that it always happen to the active controller count.
> Environment details:
> Provider: AWS
>  Kernel: 3.10.0-693.21.1.el7.x86_64
>  OS: CentOS Linux release 7.5.1804 (Core)
>  Scala version: 2.11
>  Kafka version: 2.1.0
>  Kafka config:
> {code:java}
> ############################# Socket Server Settings 
> #############################
> num.network.threads=3
> num.io.threads=8
> socket.send.buffer.bytes=102400
> socket.receive.buffer.bytes=102400
> socket.request.max.bytes=104857600
> ############################# Log Basics #############################
> num.partitions=1
> num.recovery.threads.per.data.dir=1
> ############################# Internal Topic Settings  
> #############################
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=3
> transaction.state.log.min.isr=2
> ############################# Log Retention Policy 
> #############################
> log.retention.hours=168
> log.segment.bytes=1073741824
> log.retention.check.interval.ms=300000
> ############################# Group Coordinator Settings 
> #############################
> group.initial.rebalance.delay.ms=0
> ############################# Zookeeper #############################
> zookeeper.connection.timeout.ms=6000
> broker.id=1
> zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
> log.dirs=/data/kafka-node
> advertised.listeners=PLAINTEXT://node1:9092
> {code}
> Broker disconnected controller log:
> {code:java}
> [2019-01-26 05:03:52,512] TRACE [Controller id=2] Checking need to trigger 
> auto leader balancing (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Preferred replicas by 
> broker Map(TOPICS->MAP) (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred 
> replica for broker 2 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for 
> broker 2 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred 
> replica for broker 1 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for 
> broker 1 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] DEBUG [Controller id=2] Topics not in preferred 
> replica for broker 3 Map() (kafka.controller.KafkaController)
> [2019-01-26 05:03:52,513] TRACE [Controller id=2] Leader imbalance ratio for 
> broker 3 is 0.0 (kafka.controller.KafkaController)
> [2019-01-26 05:08:52,513] TRACE [Controller id=2] Checking need to trigger 
> auto leader balancing (kafka.controller.KafkaController)
> {code}
> Broker working server.log:
> {code:java}
> [2019-01-26 05:02:05,564] INFO [ReplicaFetcher replicaId=3, leaderId=2, 
> fetcherId=0] Error sending fetch request (sessionId=1637095899, 
> epoch=21379644) to node 2: java.io.IOException: Connection to 2 was 
> disconnected before the response was read. 
> (org.apache.kafka.clients.FetchSessionHandler)
> [2019-01-26 05:02:05,573] WARN [ReplicaFetcher replicaId=3, leaderId=2, 
> fetcherId=0] Error in response for fetch request (type=FetchRequest, 
> replicaId=3, maxWait=500, minBytes=1, maxBytes=10485760, 
> fetchData={PlayerGameRounds-8=(offset=2171960, logStartOffset=1483356, 
> maxBytes=1048576, currentLeaderEpoch=Optional[2])}, 
> isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessio
> nId=1637095899, epoch=21379644)) (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 2 was disconnected before the response was 
> read
>         at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
>         at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:97)
>         at 
> kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
>         at 
> kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
>         at 
> kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
>         at scala.Option.foreach(Option.scala:257)
>         at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
>         at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> [2019-01-26 05:02:35,723] WARN Attempting to send response via channel for 
> which there is no open connection, connection id node3:9092-node2:59988-1550 
> (kafka.network.Processor)
> [2019-01-26 05:02:35,731] WARN Attempting to send response via channel for 
> which there is no open connection, connection id node3:9092-node2:59986-1550 
> (kafka.network.Processor)
> [2019-01-26 05:02:35,797] WARN Attempting to send response via channel for 
> which there is no open connection, connection id node3:9092-node2:59494-1549 
> (kafka.network.Processor)
> [2019-01-26 05:02:35,816] WARN Attempting to send response via channel for 
> which there is no open connection, connection id node3:9092-node2:53268-1530 
> (kafka.network.Processor)
> [2019-01-26 05:02:37,603] INFO [ReplicaFetcher replicaId=3, leaderId=2, 
> fetcherId=0] Error sending fetch request (sessionId=1637095899, 
> epoch=INITIAL) to node 2: java.io.IOException: Connection to 2 was 
> disconnected before the response was read. 
> (org.apache.kafka.clients.FetchSessionHandler)
> {code}
>  
> The request handler idle metrics dropped during this issue:
>   !kafka-issue.png!  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to