Ben Stopford created KAFKA-2904:
-----------------------------------
Summary: Consumer Fails to Reconnect after 30s post restarts
Key: KAFKA-2904
URL: https://issues.apache.org/jira/browse/KAFKA-2904
Project: Kafka
Issue Type: Bug
Reporter: Ben Stopford
Assignee: Ben Stopford
This problem occurs in around 1 in 20 executions of the security rolling
upgrade test.
Test scenario is a rolling upgrade where each of the three servers are
restarted in turn whilst producer and consumers run. A ten second sleep between
start and stop of each node has been added to ensure there is time for failover
to occur (re KAFKA-2827).
Failure results in no consumed messages after the failure point.
Periodically the consumer does not reconnect for its 30s timeout. The
consumer’s log at this point is at the bottom of this jira.
ISR's appear normal at the time of the failure.
The producer is able to produce throughout this period.
*TIMELINE:*
{quote}
20:39:23 - Test starts Consumer and Producer
20:39:27 - Consumer starts consuming produced messages
20:39:30 - Node 1 shutdown complete
20:39:45 - Node 1 restarts
20:39:59 - Node 2 shutdown complete
20:40:14 - Node 2 restarts
20:40:27 - Consumer stops consuming
20:40:28 - Node 2 becomes controller
20:40:28 - Node 3 shutdown complete
20:40:34 - GroupCoordinator 2: Preparing to restabilize group
unique-test-group...
20:40:42 - Node 3 restarts
*20:41:03 - Consumer times out*
20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
20:41:03 - GroupCoordinator 2: Assignment received from leader for group
unique-test-group...
20:41:03 - GroupCoordinator 2: Preparing to restabilize group
unique-test-group...
20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed
20:41:53 - Producer shuts down
{quote}
Consumer log at time of failure:
{quote}
[2015-11-27 20:40:27,268] INFO Current consumption count is 10100
(kafka.tools.ConsoleConsumer$)
[2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while
committing offsets for group unique-test-group-0.952644842527
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be
completed due to group rebalance
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while
committing offsets for group unique-test-group-0.952644842527
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,322] WARN Auto offset commit failed:
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,329] INFO Attempt to join group
unique-test-group-0.952644842527 failed due to unknown member id, resetting and
retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,347] INFO SyncGroup for group
unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining the
group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,357] INFO SyncGroup for group
unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, will
find new coordinator and rejoin
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead.
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:28,097] INFO Attempt to join group
unique-test-group-0.952644842527 failed due to unknown member id, resetting and
retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead.
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:33,627] INFO Attempt to join group
unique-test-group-0.952644842527 failed due to obsolete coordinator
information, retrying.
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:41:03,704] ERROR Error processing message, terminating consumer
process: (kafka.tools.ConsoleConsumer$)
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and will
exit. (org.apache.kafka.common.security.kerberos.Login)
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)