[ https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899305#comment-16899305 ]
Jun Rao commented on KAFKA-4545: -------------------------------- [~Yohan123], if you want to work on this jira and follow the approach I mentioned earlier, we probably need a KIP since it changes the on-disk format. > tombstone needs to be removed after delete.retention.ms has passed after it > has been cleaned > -------------------------------------------------------------------------------------------- > > Key: KAFKA-4545 > URL: https://issues.apache.org/jira/browse/KAFKA-4545 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0 > Reporter: Jun Rao > Assignee: Jose Armando Garcia Sancio > Priority: Major > > The algorithm for removing the tombstone in a compacted is supposed to be the > following. > 1. Tombstone is never removed when it's still in the dirty portion of the log. > 2. After the tombstone is in the cleaned portion of the log, we further delay > the removal of the tombstone by delete.retention.ms since the time the > tombstone is in the cleaned portion. > Once the tombstone is in the cleaned portion, we know there can't be any > message with the same key before the tombstone. Therefore, for any consumer, > if it reads a non-tombstone message before the tombstone, but can read to the > end of the log within delete.retention.ms, it's guaranteed to see the > tombstone. > However, the current implementation doesn't seem correct. We delay the > removal of the tombstone by delete.retention.ms since the last modified time > of the last cleaned segment. However, the last modified time is inherited > from the original segment, which could be arbitrarily old. So, the tombstone > may not be preserved as long as it needs to be. -- This message was sent by Atlassian JIRA (v7.6.14#76016)