[ https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Stopford updated KAFKA-2904: -------------------------------- Attachment: 2015-11-27--001 (1).tar.gz > Consumer Fails to Reconnect after 30s post restarts > --------------------------------------------------- > > Key: KAFKA-2904 > URL: https://issues.apache.org/jira/browse/KAFKA-2904 > Project: Kafka > Issue Type: Bug > Reporter: Ben Stopford > Assignee: Ben Stopford > Attachments: 2015-11-27--001 (1).tar.gz > > > This problem occurs in around 1 in 20 executions of the security rolling > upgrade test. > Test scenario is a rolling upgrade where each of the three servers are > restarted in turn whilst producer and consumers run. A ten second sleep > between start and stop of each node has been added to ensure there is time > for failover to occur (re KAFKA-2827). > Failure results in no consumed messages after the failure point. > Periodically the consumer does not reconnect for its 30s timeout. The > consumer’s log at this point is at the bottom of this jira. > ISR's appear normal at the time of the failure. > The producer is able to produce throughout this period. > *TIMELINE:* > {quote} > 20:39:23 - Test starts Consumer and Producer > 20:39:27 - Consumer starts consuming produced messages > 20:39:30 - Node 1 shutdown complete > 20:39:45 - Node 1 restarts > 20:39:59 - Node 2 shutdown complete > 20:40:14 - Node 2 restarts > 20:40:27 - Consumer stops consuming > 20:40:28 - Node 2 becomes controller > 20:40:28 - Node 3 shutdown complete > 20:40:34 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:40:42 - Node 3 restarts > *20:41:03 - Consumer times out* > 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group... > 20:41:03 - GroupCoordinator 2: Assignment received from leader for group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Preparing to restabilize group > unique-test-group... > 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed > 20:41:53 - Producer shuts down > {quote} > Consumer log at time of failure: > {quote} > [2015-11-27 20:40:27,268] INFO Current consumption count is 10100 > (kafka.tools.ConsoleConsumer$) > [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be > completed due to group rebalance > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while > committing offsets for group unique-test-group-0.952644842527 > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,322] WARN Auto offset commit failed: > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > [2015-11-27 20:40:27,329] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,347] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining > the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO SyncGroup for group > unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, > will find new coordinator and rejoin > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:28,097] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to unknown member id, resetting > and retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:40:33,627] INFO Attempt to join group > unique-test-group-0.952644842527 failed due to obsolete coordinator > information, retrying. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2015-11-27 20:41:03,704] ERROR Error processing message, terminating > consumer process: (kafka.tools.ConsoleConsumer$) > kafka.consumer.ConsumerTimeoutException > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and > will exit. (org.apache.kafka.common.security.kerberos.Login) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)