James Cheng created KAFKA-4682:
----------------------------------

             Summary: Committed offsets should not be deleted if a consumer is 
still active
                 Key: KAFKA-4682
                 URL: https://issues.apache.org/jira/browse/KAFKA-4682
             Project: Kafka
          Issue Type: Bug
            Reporter: James Cheng


Kafka will delete committed offsets that are older than 
offsets.retention.minutes

If there is an active consumer on a low traffic partition, it is possible that 
Kafka will delete the committed offset for that consumer. Once the offset is 
deleted, a restart or a rebalance of that consumer will cause the consumer to 
not find any committed offset and start consuming from earliest/latest 
(depending on auto.offset.reset). I'm not sure, but a broker failover might 
also cause you to start reading from auto.offset.reset (due to broker restart, 
or coordinator failover).

I think that Kafka should only delete offsets for inactive consumers. The timer 
should only start after a consumer group goes inactive. For example, if a 
consumer group goes inactive, then after 1 week, delete the offsets for that 
consumer group. This is a solution that [~junrao] mentioned in 
https://issues.apache.org/jira/browse/KAFKA-3806?focusedCommentId=15323521&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15323521

The current workarounds are to:
# Commit an offset on every partition you own on a regular basis, making sure 
that it is more frequent than offsets.retention.minutes (a broker-side setting 
that a consumer might not be aware of)
or
# Turn the value of offsets.retention.minutes up really really high. You have 
to make sure it is higher than any valid low-traffic rate that you want to 
support. For example, if you want to support a topic where someone produces 
once a month, you would have to set offsetes.retention.mintues to 1 month. 
or
# Turn on enable.auto.commit (this is essentially #1, but easier to implement).

None of these are ideal. 

#1 can be spammy. It requires your consumers know something about how the 
brokers are configured. Sometimes it is out of your control. Mirrormaker, for 
example, only commits offsets on partitions where it receives data. And it is 
duplication that you need to put into all of your consumers.

#2 has disk-space impact on the broker (in __consumer_offsets) as well as 
memory-size on the broker (to answer OffsetFetch).

#3 I think has the potential for message loss (the consumer might commit on 
messages that are not yet fully processed)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to