[ https://issues.apache.org/jira/browse/KAFKA-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ray Chiang updated KAFKA-4573: ------------------------------ Component/s: producer > Producer sporadic timeout > ------------------------- > > Key: KAFKA-4573 > URL: https://issues.apache.org/jira/browse/KAFKA-4573 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 0.9.0.1 > Reporter: Ankur C > Priority: Major > > We had production outage due to sporadic kafka producer timeout. About 1 to > 2% of the message would timeout continuously. > Kafka version - 0.9.0.1 > #Kafka brokers - 5 > #Replication for each topic - 3 > #Number of topics - ~30 > #Number of partition - ~300 > We have kafka 0.9.0.1 running in our 5 broker cluster for 1 month without any > issues. However, on Dec 23rd we saw sporadic kafka producer timeout. > Issue begin around 6:51am and continued until we bounced kafka broker. > 6:51am Underreplication started on small number of topics > 6:53am All underreplication recovered > 11:00am We restarted all kafka producer writer app but this didn't solve the > sporadic kafka producer timeout issue > 12:01pm We restarted all kafka broker after this the issue was resolved. > Kafka metrics and kafka logs doesn't show any major issue. There were no > offline partitions during the outage and #controller was exactly 1. > We only saw following exception in kafka broker in controller.log. This log > was present for all broker 0 to 4. > java.io.IOException: Connection to 2 was disconnected before the response was > read at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87) > at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84) > at scala.Option.foreach(Option.scala:236) at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84) > at > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80) > at > kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129) > at > kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139) > at > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2016-12-23 06:51:37,384] WARN [Controller-2-to-broker-2-send-thread], > Controller 2 epoch 18 fails to send request > {controller_id=2,controller_epoch=18,partition_states=[{topic=compliance_pipeline_fast_green,partition=4,controller_epoch=18,leader=4,leader_epoch=53,isr=[2,4],zk_version=111,replicas=[4,1,2]}],live_brokers=[{id=3,end_points=[{port=31161,host=10.126.144.73,security_protocol_type=0}]},{id=4,end_points=[{port=31355,host=10.126.144.233,security_protocol_type=0}]},{id=2,end_points=[{port=31293,host=10.126.144.137,security_protocol_type=0}]},{id=1,end_points=[{port=31824,host=10.126.144.169,security_protocol_type=0}]},{id=0,end_points=[{port=31139,host=10.126.144.201,security_protocol_type=0}]}]} > to broker Node(2, 10.126.144.137, 31293). Reconnecting to broker. > (kafka.controller.RequestSendThread) -- This message was sent by Atlassian JIRA (v7.6.3#76005)