[
https://issues.apache.org/jira/browse/KAFKA-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406027#comment-15406027
]
Michal Turek commented on KAFKA-3916:
-------------------------------------
We saw this issue yesterday, I don't know if this helps, but it may be useful
while debugging.
- Kafka 0.9.0.1 and 0.9.0.1 clients.
- There is ISR shrink and immediate ISR expand visible in graphs based on JMX
of Kafka brokers.
- Consumers were unable to commit offsets at that time.
{noformat}
2016-08-02 14:25:29.589 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-7]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.560 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-2]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.562 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-0]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.563 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-5]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.570 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-6]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.570 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-3]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.570 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-4]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.572 INFO o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-1]: Offset commit for group ... failed due to REQUEST_TIMED_OUT, will
find new coordinator and retry
2016-08-02 14:25:52.572 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-7]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.573 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-2]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.573 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-0]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.574 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-5]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-6]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-3]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.575 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-4]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.576 INFO o.a.k.c.c.internals.AbstractCoordinator
[Consumer-1]: Marking the coordinator 2147483646 dead.
2016-08-02 14:25:52.576 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-7]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-2]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-0]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.577 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-5]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-6]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-3]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.578 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-4]: Auto offset commit failed: The request timed out.
2016-08-02 14:25:52.579 WARN o.a.k.c.c.internals.ConsumerCoordinator
[Consumer-1]: Auto offset commit failed: The request timed out.
{noformat}
> Connection from controller to broker disconnects
> ------------------------------------------------
>
> Key: KAFKA-3916
> URL: https://issues.apache.org/jira/browse/KAFKA-3916
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 0.9.0.1
> Reporter: Dave Powell
>
> We recently upgraded from 0.8.2.1 to 0.9.0.1. Since then, several times per
> day, the controllers in our clusters have their connection to all brokers
> disconnected, and then successfully reconnected a few hundred ms later. Each
> time this occurs we see a brief spike in our 99th percentile produce and
> consume times, reaching several hundred ms.
> Here is an example of what we're seeing in the controller.log:
> {code}
> [2016-06-28 14:15:35,416] WARN [Controller-151-to-broker-160-send-thread],
> Controller 151 epoch 106 fails to send request {…} to broker Node(160,
> broker.160.hostname, 9092). Reconnecting to broker.
> (kafka.controller.RequestSendThread)
> java.io.IOException: Connection to 160 was disconnected before the response
> was read
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:87)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:84)
> at scala.Option.foreach(Option.scala:236)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:84)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:80)
> at
> kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
> at
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
> at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:80)
> at
> kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:180)
> at
> kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> ... one each for all brokers (including the controller) ...
> [2016-06-28 14:15:35,721] INFO [Controller-151-to-broker-160-send-thread],
> Controller 151 connected to Node(160, broker.160.hostname, 9092) for sending
> state change requests (kafka.controller.RequestSendThread)
> … one each for all brokers (including the controller) ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)