[ 
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418477#comment-15418477
 ] 

Michal Turek commented on KAFKA-3916:
-------------------------------------

Hi Jason and others,

I have finally found the root cause for our case. I was grep-ing the controller 
logs for a while to hide disturbing stack traces and noticed that all warnings 
from last month always mention the same topic, which is quite strange because 
we have about 50 of them total. This one is the largest one, with the highest 
throughput, but not strongly larger than others. Time stamps of the warnings 
are also from the afternoon or early evening when Kafka receives the highest 
traffic, so I believe the issue is somehow related to performance. There is 
also always leader=2, ISR changes between [1,2] and [1,2,3], related partitions 
are always 2, 7, 11.

While working on a completely different task yesterday, I also by coincidence 
found that consumption speed of events buffered in Kafka for this topic reached 
its limits, other topics were ok. Producers were only little slower than 
consumers. So we are trying to add few more partitions to increase parallelism 
both in consumers and internally in Kafka (replication). I guess it will help.

{noformat}
# egrep 'WARN|IOException' controller.log | less -S

[2016-08-10 19:00:14,502] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(1, (host1), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 1 was disconnected before the response was 
read
[2016-08-10 19:00:14,506] WARN [Controller-1-to-broker-4-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(4, (host4), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 4 was disconnected before the response was 
read
[2016-08-10 19:00:14,506] WARN [Controller-1-to-broker-2-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(2, (host2), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
[2016-08-10 19:00:14,502] WARN [Controller-1-to-broker-3-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2,3],zk_version=145,replicas=[2,3,1]},{topic=(always
 same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2,3],zk_version=120,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(3, (host3), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 3 was disconnected before the response was 
read
[2016-08-10 19:50:41,999] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2],zk_version=129,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(1, (host1), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 1 was disconnected before the response was 
read
[2016-08-10 19:50:42,002] WARN [Controller-1-to-broker-3-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2],zk_version=129,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(3, (host3), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 3 was disconnected before the response was 
read
[2016-08-10 19:50:42,002] WARN [Controller-1-to-broker-4-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2],zk_version=129,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(4, (host4), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 4 was disconnected before the response was 
read
[2016-08-10 19:50:42,002] WARN [Controller-1-to-broker-2-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=11,controller_epoch=18,leader=2,leader_epoch=23,isr=[1,2],zk_version=129,replicas=[2,1,3]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(2, (host2), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
[2016-08-10 20:10:12,044] WARN [Controller-1-to-broker-2-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=7,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=136,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(2, (host2), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
[2016-08-10 20:10:12,044] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=7,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=136,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(1, (host1), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 1 was disconnected before the response was 
read
[2016-08-10 20:10:12,044] WARN [Controller-1-to-broker-3-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=7,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=136,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(3, (host3), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 3 was disconnected before the response was 
read
[2016-08-10 20:10:12,044] WARN [Controller-1-to-broker-4-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=7,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=136,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(4, (host4), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 4 was disconnected before the response was 
read
[2016-08-10 21:04:11,998] WARN [Controller-1-to-broker-1-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=168,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(1, (host1), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 1 was disconnected before the response was 
read
[2016-08-10 21:04:12,001] WARN [Controller-1-to-broker-4-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=168,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(4, (host4), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 4 was disconnected before the response was 
read
[2016-08-10 21:04:11,998] WARN [Controller-1-to-broker-2-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=168,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(2, (host2), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
[2016-08-10 21:04:11,998] WARN [Controller-1-to-broker-3-send-thread], 
Controller 1 epoch 20 fails to send request 
{controller_id=1,controller_epoch=20,partition_states=[{topic=(always same 
topic),partition=2,controller_epoch=18,leader=2,leader_epoch=26,isr=[1,2],zk_version=168,replicas=[2,3,1]}],live_brokers=[{id=1,end_points=[{port=9092,host=(host1),security_protocol_type=0}]},{id=2,end_points=[{port=9092,host=(host2),security_protocol_type=0}]},{id=3,end_points=[{port=9092,host=(host3),security_protocol_type=0}]},{id=4,end_points=[{port=9092,host=(host4),security_protocol_type=0}]}]}
 to broker Node(3, (host3), 9092). Reconnecting to broker. 
(kafka.controller.RequestSendThread)
java.io.IOException: Connection to 3 was disconnected before the response was 
read
{noformat}

> Connection from controller to broker disconnects
> ------------------------------------------------
>
>                 Key: KAFKA-3916
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3916
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.9.0.1
>            Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per 
> day, the controllers in our clusters have their connection to all brokers 
> disconnected, and then successfully reconnected a few hundred ms later. Each 
> time this occurs we see a brief spike in our 99th percentile produce and 
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread], 
> Controller 151 epoch 106 fails to send request {…} to broker Node(160, 
> broker.160.hostname, 9092). Reconnecting to broker. 
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response 
> was read
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
>         at scala.Option.foreach(Option.scala:236)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
>         at 
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
>         at 
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
>         at 
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
>         at 
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
>         at 
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
>         at 
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
>  [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread], 
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending 
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to