[ https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918980#comment-16918980 ]
Richard Yu edited comment on KAFKA-8522 at 8/29/19 10:01 PM: ------------------------------------------------------------- [~junrao] I think I've hit a caveat with your approach. The problem I've encountered here is that the partitions that are "assigned" to a LogCleaner could fluctuate after the LogCleaner instance is constructed. This has some implications because new TopicPartitions could be added to or removed from this "assignment". The consequences are that files are created and removed far more often than comfortable under certain conditions. For details, I noticed that in LogCleanerManager constructor, the {{logs}} parameter (the equivalent of the "assignment") is essentially a ConcurrentMap which can have its contents change after initialization. That means files also have to be repeatedly created and destroyed. Your thoughts on this? was (Author: yohan123): @Jun Rao I think I've hit a caveat with your approach. The problem I've encountered here is that the partitions that are "assigned" to a LogCleaner could fluctuate after the LogCleaner instance is constructed. This has some implications because new TopicPartitions could be added to or removed from this "assignment". The consequences are that files are created and removed far more often than comfortable under certain conditions. For details, I noticed that in LogCleanerManager constructor, the {{logs}} parameter (the equivalent of the "assignment") is essentially a ConcurrentMap which can have its contents change after initialization. That means files also have to be repeatedly created and destroyed. Your thoughts on this? > Tombstones can survive forever > ------------------------------ > > Key: KAFKA-8522 > URL: https://issues.apache.org/jira/browse/KAFKA-8522 > Project: Kafka > Issue Type: Improvement > Components: log cleaner > Reporter: Evelyn Bayes > Priority: Minor > > This is a bit grey zone as to whether it's a "bug" but it is certainly > unintended behaviour. > > Under specific conditions tombstones effectively survive forever: > * Small amount of throughput; > * min.cleanable.dirty.ratio near or at 0; and > * Other parameters at default. > What happens is all the data continuously gets cycled into the oldest > segment. Old records get compacted away, but the new records continuously > update the timestamp of the oldest segment reseting the countdown for > deleting tombstones. > So tombstones build up in the oldest segment forever. > > While you could "fix" this by reducing the segment size, this can be > undesirable as a sudden change in throughput could cause a dangerous number > of segments to be created. -- This message was sent by Atlassian Jira (v8.3.2#803003)