[
https://issues.apache.org/jira/browse/KAFKA-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321094#comment-15321094
]
mjuarez edited comment on KAFKA-2904 at 6/8/16 6:22 PM:
--------------------------------------------------------
I'm seeing this issue repeatedly when leaving a Kafka consumer running for
anything longer than a few hours, on a small volume topic (~2400
messages/second). This is on Kafka 0.9.0.1 brokers, using the Java 0.9.0.1
client jars.
{quote}
2016-06-08 09:52:02,633
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
[pool-2-thread-1] ERROR Error ILLEGAL_GENERATION occurred while committing
offsets for group TEST1_haymaker_to_hdfs
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed due to group rebalance
{quote}
After that, the app logs start getting flooded with this unhelpful error
message:
{quote}
2016-06-08 09:52:11,321
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
[pool-2-thread-1] ERROR Offset commit failed.
org.apache.kafka.clients.consumer.internals.SendFailedException
{quote}
I have confirmed that the application is still consuming and committing offsets
successfully, but it seems the ConsumerCoordinator is stuck trying to commit an
offset, and failing repeatedly.
was (Author: mjuarez):
I'm seeing this issue repeatedly when leaving a Kafka consumer running for
anything longer than a few hours, on a small volume topic (~2400
messages/second). This is on Kafka 0.9.0.1 brokers, using the Java 0.9.0.1
client jars.
{quote}
2016-06-08 09:52:02,633
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
[pool-2-thread-1] ERROR Error ILLEGAL_GENERATION occurred while committing
offsets for group TEST1_haymaker_to_hdfs
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed due to group rebalance
{quote}
> Consumer Fails to Reconnect after 30s post restarts
> ---------------------------------------------------
>
> Key: KAFKA-2904
> URL: https://issues.apache.org/jira/browse/KAFKA-2904
> Project: Kafka
> Issue Type: Bug
> Reporter: Ben Stopford
> Assignee: Ben Stopford
> Attachments: 2015-11-27--001 (1).tar.gz
>
>
> This problem occurs in around 1 in 20 executions of the security rolling
> upgrade test.
> Test scenario is a rolling upgrade where each of the three servers are
> restarted in turn whilst producer and consumers run. A ten second sleep
> between start and stop of each node has been added to ensure there is time
> for failover to occur (re KAFKA-2827).
> Failure results in no consumed messages after the failure point.
> Periodically the consumer does not reconnect for its 30s timeout. The
> consumer’s log at this point is at the bottom of this jira.
> ISR's appear normal at the time of the failure.
> The producer is able to produce throughout this period.
> *TIMELINE:*
> {quote}
> 20:39:23 - Test starts Consumer and Producer
> 20:39:27 - Consumer starts consuming produced messages
> 20:39:30 - Node 1 shutdown complete
> 20:39:45 - Node 1 restarts
> 20:39:59 - Node 2 shutdown complete
> 20:40:14 - Node 2 restarts
> 20:40:27 - Consumer stops consuming
> 20:40:28 - Node 2 becomes controller
> 20:40:28 - Node 3 shutdown complete
> 20:40:34 - GroupCoordinator 2: Preparing to restabilize group
> unique-test-group...
> 20:40:42 - Node 3 restarts
> *20:41:03 - Consumer times out*
> 20:41:03 - GroupCoordinator 2: Stabilized group unique-test-group...
> 20:41:03 - GroupCoordinator 2: Assignment received from leader for group
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Preparing to restabilize group
> unique-test-group...
> 20:41:03 - GroupCoordinator 2: Group unique-test-group... is dead and removed
> 20:41:53 - Producer shuts down
> {quote}
> Consumer log at time of failure:
> {quote}
> [2015-11-27 20:40:27,268] INFO Current consumption count is 10100
> (kafka.tools.ConsoleConsumer$)
> [2015-11-27 20:40:27,321] ERROR Error ILLEGAL_GENERATION occurred while
> committing offsets for group unique-test-group-0.952644842527
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,321] WARN Auto offset commit failed: Commit cannot be
> completed due to group rebalance
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] ERROR Error ILLEGAL_GENERATION occurred while
> committing offsets for group unique-test-group-0.952644842527
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,322] WARN Auto offset commit failed:
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2015-11-27 20:40:27,329] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to unknown member id, resetting
> and retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,347] INFO SyncGroup for group
> unique-test-group-0.952644842527 failed due to UNKNOWN_MEMBER_ID, rejoining
> the group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO SyncGroup for group
> unique-test-group-0.952644842527 failed due to NOT_COORDINATOR_FOR_GROUP,
> will find new coordinator and rejoin
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:27,357] INFO Marking the coordinator 2147483644 dead.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:28,097] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to unknown member id, resetting
> and retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Marking the coordinator 2147483646 dead.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:40:33,627] INFO Attempt to join group
> unique-test-group-0.952644842527 failed due to obsolete coordinator
> information, retrying.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-27 20:41:03,704] ERROR Error processing message, terminating
> consumer process: (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:59)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:112)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> [2015-11-27 20:41:03,737] WARN TGT renewal thread has been interrupted and
> will exit. (org.apache.kafka.common.security.kerberos.Login)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)