[
https://issues.apache.org/jira/browse/KAFKA-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fathima Khazana Abdul Haiyum resolved KAFKA-13979.
--------------------------------------------------
Resolution: Not A Bug
> Kafka resets committed offset after rebalance
> ---------------------------------------------
>
> Key: KAFKA-13979
> URL: https://issues.apache.org/jira/browse/KAFKA-13979
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.6.2
> Reporter: Fathima Khazana Abdul Haiyum
> Priority: Critical
>
> We have 3 nodes in our MSK cluster which run Apache Kafka 2.6.2. We have 15
> partitions for a topic and 5 consumers in our consumer group, where each
> consumer runs on it's own java application server. Whenever we
> deploy(rolling) to our servers, we notice a huge consumer lag on *some* of
> the 15 partitions. It appears that the consumer after rebalancing resets its
> committed offset and reprocesses messages. For example: this is what I'm
> seeing:
> {code:java}
> logger_name:org.apache.kafka.clients.consumer.internals.ConsumerCoordinator
> message:[Consumer clientId=myService-mytopic-0, groupId=myService-mytopic]
> Committed offset 3044 for partition mytopic-0{code}
>
> So we know for a fact that the offset 3044 has been committed for partition 0.
>
> Running {{./kafka-consumer-groups.sh --describe}} gives the following:
> {code:java}
> GROUP PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID CLIENT-ID
> myService-mytopic 0 3044 3044 0 myService-mytopic-0
> {code}
> {{ }}
> After a deploy, which removes the consumer from the group and triggers a
> rebalance + adds the consumer back, I see this:
> {code:java}
> GROUP PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID CLIENT-ID
> myService-mytopic 0 1890 3047 1157 myService-mytopic-0{code}
>
> In the application logs, I see this:
> {code:java}
> logger_name:org.apache.kafka.clients.consumer.internals.Fetcher
> message:[Consumer clientId=myService-mytopic-0, groupId=myService-mytopic]
> Fetch position FetchPosition{offset=1890, offsetEpoch=Optional.empty,
> currentLeader=LeaderAndEpoch{leader=Optional[b-3.kafka-mytestserver.1gkwlu.c16.kafka.us-east-1.amazonaws.com:9098
> (id: 3 rack: use1-az1)], epoch=0 is out of range for partition mytopic-0,
> resetting offset}}{code}
> Why is kafka fetching the current-offset 1890 which is before the committed
> offset for the partition after rebalance? This is on a test environment where
> less than 1 message is produced per second. This issue occurs for both auto
> commit (default interval) and manual commit mechanisms and on kafka versions
> 2.6.2 and 2.8.1. On production, we have much more traffic and causes
> reprocessing of around 2 million messages per partition.
> {{auto.offset.reset=latest}} and {{retention.ms=1000}} if that matters. We're
> using the java client {{kafka-clients}} version 3.0.0.
> The five consumers have the same {{{}client.id{}}}.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)