[ https://issues.apache.org/jira/browse/KAFKA-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990303#comment-16990303 ]
Richard Yu commented on KAFKA-9285: ----------------------------------- cc [~bchen225242] You might be interested in this. > Implement failed message topic to account for processing lag during failure > --------------------------------------------------------------------------- > > Key: KAFKA-9285 > URL: https://issues.apache.org/jira/browse/KAFKA-9285 > Project: Kafka > Issue Type: New Feature > Components: consumer > Reporter: Richard Yu > Priority: Major > > Presently, in current Kafka failure schematics, when a consumer crashes, the > user is typically responsible for both detecting as well as restarting the > failed consumer. Therefore, during this period of time, when the consumer is > dead, it would result in a period of inactivity where no records are > consumed, hence lag results. Previously, there has been attempts to resolve > this problem: when failure is detected by broker, a substitute consumer will > be started (the so-called [Rebalance > Consumer|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-333%3A+Add+faster+mode+of+rebalancing]]) > which will continue processing records in Kafka's stead. > However, this has complications, as records will only be stored locally, and > in case of this consumer failing as well, that data will be lost. Instead, we > need to consider how we can still process these records and at the same time > effectively _persist_ them. It is here that I propose the concept of a > _failed message topic._ At a high level, it works like this. When we find > that a consumer has failed, messages which was originally meant to be sent to > that consumer would be redirected to this failed messaged topic. The user can > choose to assign consumers to this topic, which would consume messages from > failed consumers while other consumer threads are down. > Naturally, records from different topics can not go into the same failed > message topic, since we cannot tell which records belong to which consumer. > -- This message was sent by Atlassian Jira (v8.3.4#803005)