[
https://issues.apache.org/jira/browse/KAFKA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson resolved KAFKA-9840.
------------------------------------
Fix Version/s: 2.6.0
Resolution: Fixed
> Consumer should not use OffsetForLeaderEpoch without current epoch validation
> -----------------------------------------------------------------------------
>
> Key: KAFKA-9840
> URL: https://issues.apache.org/jira/browse/KAFKA-9840
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 2.4.1
> Reporter: Jason Gustafson
> Assignee: Boyang Chen
> Priority: Major
> Fix For: 2.6.0
>
>
> We have observed a case where the consumer attempted to detect truncation
> with the OffsetsForLeaderEpoch API against a broker which had become a
> zombie. In this case, the last epoch known to the consumer was higher than
> the last epoch known to the zombie broker, so the broker returned -1 as both
> the end offset and epoch in the response. The consumer did not check for this
> in the response, which resulted in the following message:
> {code}
> Truncation detected for partition topic-1 at offset
> FetchPosition{offset=11859, offsetEpoch=Optional[46],
> currentLeader=LeaderAndEpoch{leader=broker-host (id: 3 rack: null),
> epoch=-1}}, resetting offset to the first offset known to diverge
> FetchPosition{offset=-1, offsetEpoch=Optional[-1],
> currentLeader=LeaderAndEpoch{broker-host (id: 3 rack: null), epoch=-1}}
> (org.apache.kafka.clients.consumer.internals.SubscriptionState:414)
> {code}
> There are a couple ways we the consumer can handle this situation better.
> First, the reason we did not detect the zombie broker is that we did not
> include the current leader epoch in the OffsetForLeaderEpoch request. This
> was likely because of KAFKA-9212. Following this patch, we would not
> initialize the current leader epoch from metadata responses because there are
> cases that we cannot rely on it. But if the client cannot rely on being able
> to detect zombies, then the epoch validation is less useful anyway. So the
> simple solution is to not bother with the validation unless we have a
> reliable current leader epoch.
> Second, the consumer needs to check for the case when the returned offset and
> epoch are not defined. In this case, we have to treat this as a normal
> OffsetOutOfRange case and invoke the reset policy.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)