[ https://issues.apache.org/jira/browse/KAFKA-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238126#comment-16238126 ]
Drew Kutcharian commented on KAFKA-4682: ---------------------------------------- This just happened to us and I just stumbled upon this JIRA while trying to figure out the cause. A few questions: 1. Aren't consumer offset topics compacted? Shouldn't at least the last entry stay on disk after cleanup? 2. Considering that they are compacted, what is the real concern with workaround 2 in the description: "2. Turn the value of offsets.retention.minutes up really really high"? 3. As a workaround, would it make sense to set {{offsets.retention.ms}} to the same value as {{logs.retention.ms}} and {{auto.offset.reset}} to {{earliest}}? That way consumers and logs would "reset" the same time? 4. Is there a timeline for the release of KIP-211? > Committed offsets should not be deleted if a consumer is still active > (KIP-211) > ------------------------------------------------------------------------------- > > Key: KAFKA-4682 > URL: https://issues.apache.org/jira/browse/KAFKA-4682 > Project: Kafka > Issue Type: Bug > Reporter: James Cheng > Assignee: Vahid Hashemian > Priority: Major > Labels: kip > > Kafka will delete committed offsets that are older than > offsets.retention.minutes > If there is an active consumer on a low traffic partition, it is possible > that Kafka will delete the committed offset for that consumer. Once the > offset is deleted, a restart or a rebalance of that consumer will cause the > consumer to not find any committed offset and start consuming from > earliest/latest (depending on auto.offset.reset). I'm not sure, but a broker > failover might also cause you to start reading from auto.offset.reset (due to > broker restart, or coordinator failover). > I think that Kafka should only delete offsets for inactive consumers. The > timer should only start after a consumer group goes inactive. For example, if > a consumer group goes inactive, then after 1 week, delete the offsets for > that consumer group. This is a solution that [~junrao] mentioned in > https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521 > The current workarounds are to: > # Commit an offset on every partition you own on a regular basis, making sure > that it is more frequent than offsets.retention.minutes (a broker-side > setting that a consumer might not be aware of) > or > # Turn the value of offsets.retention.minutes up really really high. You have > to make sure it is higher than any valid low-traffic rate that you want to > support. For example, if you want to support a topic where someone produces > once a month, you would have to set offsetes.retention.mintues to 1 month. > or > # Turn on enable.auto.commit (this is essentially #1, but easier to > implement). > None of these are ideal. > #1 can be spammy. It requires your consumers know something about how the > brokers are configured. Sometimes it is out of your control. Mirrormaker, for > example, only commits offsets on partitions where it receives data. And it is > duplication that you need to put into all of your consumers. > #2 has disk-space impact on the broker (in __consumer_offsets) as well as > memory-size on the broker (to answer OffsetFetch). > #3 I think has the potential for message loss (the consumer might commit on > messages that are not yet fully processed) -- This message was sent by Atlassian JIRA (v6.4.14#64029)