[ https://issues.apache.org/jira/browse/KAFKA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838918#comment-15838918 ]
Jeff Widman commented on KAFKA-3806: ------------------------------------ Too often people not super familiar with Kafka don't realize the default is so low... So I would rather see this default to 4 days. If all processes in a consumer group randomly blow out Friday late afternoon and it's a low-priority consumer group, so ops/dev team decides to wait til Monday to fix the problem, thinking that the consumer will just catch back up when they get it fixed, then they'll be surprised to learn their offsets were hosed. For those companies big enough to have a performance hit from maintaining offsets that long, they will generally have the in-house resources to realize they should reduce this value. > Adjust default values of log.retention.hours and offsets.retention.minutes > -------------------------------------------------------------------------- > > Key: KAFKA-3806 > URL: https://issues.apache.org/jira/browse/KAFKA-3806 > Project: Kafka > Issue Type: Improvement > Components: config > Affects Versions: 0.9.0.1, 0.10.0.0 > Reporter: Michal Turek > Priority: Minor > > Combination of default values of log.retention.hours (168 hours = 7 days) and > offsets.retention.minutes (1440 minutes = 1 day) may be dangerous in special > cases. Offset retention should be always greater than log retention. > We have observed the following scenario and issue: > - Producing of data to a topic was disabled two days ago by producer update, > topic wasn't deleted. > - Consumer consumed all data and properly committed offsets to Kafka. > - Consumer made no more offset commits for that topic because there was no > more incoming data and there was nothing to confirm. (We have auto-commit > disabled, I'm not sure how behaves enabled auto-commit.) > - After one day: Kafka cleared too old offsets according to > offsets.retention.minutes. > - After two days: Long-term running consumer was restarted after update, it > didn't find any committed offsets for that topic since they were deleted by > offsets.retention.minutes so it started consuming from the beginning. > - The messages were still in Kafka due to larger log.retention.hours, about 5 > days of messages were read again. > Known workaround to solve this issue: > - Explicitly configure log.retention.hours and offsets.retention.minutes, > don't use defaults. > Proposals: > - Prolong default value of offsets.retention.minutes to be at least twice > larger than log.retention.hours. > - Check these values during Kafka startup and log a warning if > offsets.retention.minutes is smaller than log.retention.hours. > - Add a note to migration guide about differences between storing of offsets > in ZooKeeper and Kafka (http://kafka.apache.org/documentation.html#upgrade). -- This message was sent by Atlassian JIRA (v6.3.4#6332)