[
https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134158#comment-15134158
]
Simon Cooper edited comment on KAFKA-2904 at 2/5/16 1:24 PM:
-------------------------------------------------------------
We've seen a similar issue when first starting up brokers - consumers don't
work for ~30s after broker startup, but it is sometimes as much as several
minutes before consumers work!
We're not using consumer groups at all, we're handling partition assignment &
offset management manually.
was (Author: thecoop1984):
We've seen a similar issue when first starting up brokers - consumers don't
work for ~30s after startup, but it is sometimes as much as several minutes
before consumers work!
We're not using consumer groups at all, we're handling partition assignment &
offset management manually.
> Consumer Fails to Reconnect after 30s post restarts
> ---------------------------------------------------
>
> Key: KAFKA-2904
> URL: https://issues.apache.org/jira/browse/KAFKA-2904
> Project: Kafka
> Issue Type: Bug
> Reporter: Ben Stopford
> Assignee: Ben Stopford
> Attachments: 2015-11-27--001 (1).tar.gz
>
>
> This problem occurs in around 1 in 20 executions of the security rolling
> upgrade test.
> Test scenario is a rolling upgrade where each of the three servers are
> restarted in turn whilst producer and consumers run. A ten second sleep
> between start and stop of each node has been added to ensure there is time
> for failover to occur (re KAFKA-2827).
> Failure results in no consumed messages after the failure point.
> Periodically the consumer does not reconnect for its 30s timeout. The
> consumer’s log at this point is at the bottom of this jira.
> ISR's appear normal at the time of the failure.
> The producer is able to produce throughout this period.
> *TIMELINE:*
> {quote}
> 20:39:23 - Test starts Consumer and Producer
> 20:39:27 - Consumer starts consuming produced messages
> 20:39:30 - Node 1 shutdown complete
> 20:39:45 - Node 1 restarts
> 20:39:59 - Node 2 shutdown complete
> 20:40:14 - Node 2 restarts
> 20:40:27 - Consumer stops consuming
> 20:40:28 - Node 2 becomes controller
> 20:40:28 - Node 3 shutdown complete
> 20:40:34 - GroupCoordinator 2: Preparing to restabilize group
> unique-test-group...
> 20:40:42 - Node 3 restarts
> *20:41:03 - Consumer times out*
> 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Assignment received from leader for group
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Preparing to restabilize group
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed
> 20:41:53 - Producer shuts down
> {quote}
> Consumer log at time of failure:
> {quote}
> [2015-11-27 20:40:27,268] INFO Current consumption count is 10100
> (kafka.tools.ConsoleConsumer$)
> [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while
> committing offsets for group unique-test-group-0.952644842527
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be
> completed due to group rebalance
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while
> committing offsets for group unique-test-group-0.952644842527
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] WARN Auto offset commit failed:
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,329] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to unknown member id, resetting
> and retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,347] INFO SyncGroup for group
> unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining
> the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO SyncGroup for group
> unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP,
> will find new coordinator and rejoin
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:28,097] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to unknown member id, resetting
> and retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to obsolete coordinator
> information, retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:41:03,704] ERROR Error processing message, terminating
> consumer process: (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and
> will exit. (org.apache.kafka.common.security.kerberos.Login)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)