[ https://issues.apache.org/jira/browse/KAFKA-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17491076#comment-17491076 ]
Guozhang Wang commented on KAFKA-12256: --------------------------------------- Hello [~rleslie] yeah I think we can close this ticket along with KAFKA-13310. Regarding the backporting, I think it makes sense to backport to 3.0, I will see if it has any cherry-picking conflicts. > auto commit causes delays due to retriable UNKNOWN_TOPIC_OR_PARTITION > --------------------------------------------------------------------- > > Key: KAFKA-12256 > URL: https://issues.apache.org/jira/browse/KAFKA-12256 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 2.0.0 > Reporter: Ryan Leslie > Priority: Minor > Labels: new-consumer-threading-should-fix > > In KAFKA-6829 a change was made to the consumer to internally retry commits > upon receiving UNKNOWN_TOPIC_OR_PARTITION. > Though this helped mitigate issues around stale broker metadata, there were > some valid concerns around the negative effects for routine topic deletion: > https://github.com/apache/kafka/pull/4948 > In particular, if a commit is issued for a deleted topic, retries can block > the consumer for up to max.poll.interval.ms. This is tunable of course, but > any amount of stalling in a consumer can lead to unnecessary lag. > One of the assumptions while permitting the change was that in practice it > should be rare for commits to occur for deleted topics, since that would > imply messages were being read or published at the time of deletion. It's > fair to expect users to not delete topics that are actively published to. But > this assumption is false in cases where auto commit is enabled. > With the current implementation of auto commit, the consumer will regularly > issue commits for all topics being fetched from, regardless of whether or not > messages were actually received. The fetch positions are simply flushed, even > when they are 0. This is simple and generally efficient, though it does mean > commits are often redundant. Besides the auto commit interval, commits are > also issued at the time of rebalance, which is often precisely at the time > topics are deleted. > This means that in practice commits for deleted topics are not really rare. > This is particularly an issue when the consumer is subscribed to a multitude > of topics using a wildcard. For example, a consumer might subscribe to a > particular "flavor" of topic with the aim of auditing all such data, and > these topics might dynamically come and go. The consumer's metadata and > rebalance mechanisms are meant to handle this gracefully, but the end result > is that such groups are often blocked in a commit for several seconds or > minutes (the default is 5 minutes) whenever a delete occurs. This can > sometimes result in significant lag. > Besides having users abandon auto commit in the face of topic deletes, there > are probably multiple ways to deal with this, including reconsidering if > commits still truly need to be retried here, or if this behavior should be > more configurable; e.g. having a separate commit timeout or policy. In some > cases the loss of a commit and subsequent message duplication is still > preferred to processing delays. And having an artificially low > max.poll.interval.ms or rebalance.timeout.ms comes with its own set of > concerns. > In the very least the current behavior and pitfalls around delete with active > consumers should be documented. -- This message was sent by Atlassian Jira (v8.20.1#820001)