Jason Gustafson created KAFKA-16012:
---------------------------------------
Summary: Incomplete range assignment in consumer
Key: KAFKA-16012
URL: https://issues.apache.org/jira/browse/KAFKA-16012
Project: Kafka
Issue Type: Bug
Reporter: Jason Gustafson
Fix For: 3.7.0
We were looking into test failures here:
https://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1702475525--jolshan--kafka-15784--7cad567675/2023-12-13--001./2023-12-13–001./report.html.
Here is the first failure in the report:
{code:java}
====================================================================================================
test_id:
kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=clean_bounce.bounce_target=brokers
status: FAIL
run time: 3 minutes 4.950 seconds
TimeoutError('Consumer consumed only 88223 out of 100000 messages in 90s')
{code}
We traced the failure to an apparent bug during the last rebalance before the
group became empty. The last remaining instance seems to receive an incomplete
assignment which prevents it from completing expected consumption on some
partitions. Here is the rebalance from the coordinator's perspective:
{code:java}
server.log.2023-12-13-04:[2023-12-13 04:58:56,987] INFO [GroupCoordinator 3]:
Stabilized group grouped-transactions-test-consumer-group generation 5
(__consumer_offsets-2) with 1 members (kafka.coordinator.group.GroupCoordinator)
server.log.2023-12-13-04:[2023-12-13 04:58:56,990] INFO [GroupCoordinator 3]:
Assignment received from leader
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
for group grouped-transactions-test-consumer-group for generation 5. The group
has 1 members, 0 of which are static.
(kafka.coordinator.group.GroupCoordinator) {code}
The group is down to one member in generation 5. In the previous generation,
the consumer in question reported this assignment:
{code:java}
// Gen 4: we've got partitions 0-4
[2023-12-13 04:58:52,631] DEBUG [Consumer
clientId=consumer-grouped-transactions-test-consumer-group-1,
groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with
generation 4 and memberId
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2023-12-13 04:58:52,631] INFO [Consumer
clientId=consumer-grouped-transactions-test-consumer-group-1,
groupId=grouped-transactions-test-consumer-group] Notifying assignor about the
new Assignment(partitions=[input-topic-0, input-topic-1, input-topic-2,
input-topic-3, input-topic-4])
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
However, in generation 5, we seem to be assigned only one partition:
{code:java}
// Gen 5: Now we have only partition 1? But aren't we the last member in the
group?
[2023-12-13 04:58:56,954] DEBUG [Consumer
clientId=consumer-grouped-transactions-test-consumer-group-1,
groupId=grouped-transactions-test-consumer-group] Executing onJoinComplete with
generation 5 and memberId
consumer-grouped-transactions-test-consumer-group-1-2164f472-93f3-4176-af3f-23d4ed8b37fd
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2023-12-13 04:58:56,955] INFO [Consumer
clientId=consumer-grouped-transactions-test-consumer-group-1,
groupId=grouped-transactions-test-consumer-group] Notifying assignor about the
new Assignment(partitions=[input-topic-1])
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) {code}
The assignment type is range from the JoinGroup for generation 5. The decoded
metadata sent by the consumer is this:
{code:java}
Subscription(topics=[input-topic], ownedPartitions=[], groupInstanceId=null,
generationId=4, rackId=null) {code}
Here is the decoded assignment from the SyncGroup:
{code:java}
Assignment(partitions=[input-topic-1]) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)