Ben Stopford created KAFKA-2904: ----------------------------------- Summary: Consumer Fails to Reconnect after 30s post restarts Key: KAFKA-2904 URL: https://issues.apache.org/jira/browse/KAFKA-2904 Project: Kafka Issue Type: Bug Reporter: Ben Stopford Assignee: Ben Stopford
This problem occurs in around 1 in 20 executions of the security rolling upgrade test. Test scenario is a rolling upgrade where each of the three servers are restarted in turn whilst producer and consumers run. A ten second sleep between start and stop of each node has been added to ensure there is time for failover to occur (re KAFKA-2827). Failure results in no consumed messages after the failure point. Periodically the consumer does not reconnect for its 30s timeout. The consumer’s log at this point is at the bottom of this jira. ISR's appear normal at the time of the failure. The producer is able to produce throughout this period. *TIMELINE:* {quote} 20:39:23 - Test starts Consumer and Producer 20:39:27 - Consumer starts consuming produced messages 20:39:30 - Node 1 shutdown complete 20:39:45 - Node 1 restarts 20:39:59 - Node 2 shutdown complete 20:40:14 - Node 2 restarts 20:40:27 - Consumer stops consuming 20:40:28 - Node 2 becomes controller 20:40:28 - Node 3 shutdown complete 20:40:34 - GroupCoordinator 2: Preparing to restabilize group unique-test-group... 20:40:42 - Node 3 restarts *20:41:03 - Consumer times out* 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group... 20:41:03 - GroupCoordinator 2: Assignment received from leader for group unique-test-group... 20:41:03 - GroupCoordinator 2: Preparing to restabilize group unique-test-group... 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed 20:41:53 - Producer shuts down {quote} Consumer log at time of failure: {quote} [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 (kafka.tools.ConsoleConsumer$) [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while committing offsets for group unique-test-group-0.952644842527 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be completed due to group rebalance (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while committing offsets for group unique-test-group-0.952644842527 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2015-11-27 20:40:27,322] WARN Auto offset commit failed: (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) [2015-11-27 20:40:27,329] INFO Attempt to join group unique-test-group-0.952644842527 failed due to unknown member id, resetting and retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:27,347] INFO SyncGroup for group unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:27,357] INFO SyncGroup for group unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, will find new coordinator and rejoin (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:28,097] INFO Attempt to join group unique-test-group-0.952644842527 failed due to unknown member id, resetting and retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:40:33,627] INFO Attempt to join group unique-test-group-0.952644842527 failed due to obsolete coordinator information, retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2015-11-27 20:41:03,704] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) kafka.consumer.ConsumerTimeoutException at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59) at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112) at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.Login) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)