Ben Stopford created KAFKA-2904:
-----------------------------------

             Summary: Consumer Fails to Reconnect after 30s post restarts
                 Key: KAFKA-2904
                 URL: https://issues.apache.org/jira/browse/KAFKA-2904
             Project: Kafka
          Issue Type: Bug
            Reporter: Ben Stopford
            Assignee: Ben Stopford


This problem occurs in around 1 in 20 executions of the security rolling 
upgrade test. 

Test scenario is a rolling upgrade where each of the three servers are 
restarted in turn whilst producer and consumers run. A ten second sleep between 
start and stop of each node has been added to ensure there is time for failover 
to occur (re KAFKA-2827). 

Failure results in no consumed messages after the failure point. 

Periodically the consumer does not reconnect for its 30s timeout. The 
consumer’s log at this point is at the bottom of this jira.

ISR's appear normal at the time of the failure.

The producer is able to produce throughout this period. 

*TIMELINE:*

{quote}
20:39:23 - Test starts Consumer and Producer
20:39:27 - Consumer starts consuming produced messages
20:39:30 - Node 1 shutdown complete
20:39:45 - Node 1 restarts
20:39:59 - Node 2 shutdown complete
20:40:14 - Node 2 restarts 
20:40:27 - Consumer stops consuming
20:40:28 - Node 2 becomes controller
20:40:28 - Node 3 shutdown complete
20:40:34 - GroupCoordinator 2: Preparing to restabilize group 
unique-test-group...
20:40:42 - Node 3 restarts
*20:41:03 - Consumer times out*
20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
20:41:03 - GroupCoordinator 2: Assignment received from leader for group 
unique-test-group...
20:41:03 - GroupCoordinator 2: Preparing to restabilize group 
unique-test-group...
20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed 
20:41:53 - Producer shuts down
{quote}


Consumer log at time of failure:


{quote}
[2015-11-27 20:40:27,268] INFO Current consumption count is 10100 
(kafka.tools.ConsoleConsumer$)
[2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while 
committing offsets for group unique-test-group-0.952644842527 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be 
completed due to group rebalance 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while 
committing offsets for group unique-test-group-0.952644842527 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,322] WARN Auto offset commit failed:  
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2015-11-27 20:40:27,329] INFO Attempt to join group 
unique-test-group-0.952644842527 failed due to unknown member id, resetting and 
retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,347] INFO SyncGroup for group 
unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining the 
group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,357] INFO SyncGroup for group 
unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP, will 
find new coordinator and rejoin 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:28,097] INFO Attempt to join group 
unique-test-group-0.952644842527 failed due to unknown member id, resetting and 
retrying. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:40:33,627] INFO Attempt to join group 
unique-test-group-0.952644842527 failed due to obsolete coordinator 
information, retrying. 
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
[2015-11-27 20:41:03,704] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
kafka.consumer.ConsumerTimeoutException
        at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
        at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
        at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
        at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
        at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
[2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and will 
exit. (org.apache.kafka.common.security.kerberos.Login)
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to