[ https://issues.apache.org/jira/browse/KAFKA-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368821#comment-17368821 ]
Nurlan Turdaliev edited comment on KAFKA-12378 at 6/24/21, 1:03 PM: -------------------------------------------------------------------- Voting here, at least a warning somewhere in the docs would be good. In our case, it wasn't event the leader node shutting down. It was a follower that became leader immediately after startup (which is also suspicious, why would it do so?) was (Author: entea): Voting here, at least a warning somewhere in the docs would be good. In our case, it wasn't event the leader node shutting down. > If a broker is down for more then `delete.retention.ms` deleted records in a > compacted topic can come back. > ----------------------------------------------------------------------------------------------------------- > > Key: KAFKA-12378 > URL: https://issues.apache.org/jira/browse/KAFKA-12378 > Project: Kafka > Issue Type: Bug > Reporter: Shane > Priority: Major > > If the leader of a compacted topic goes offline, or has replication lag > longer than the `delete.retention.ms` of a topic, records that are tombstoned > can come back once the leader catches up then becomes the leader. > > Example of this happening: > Topic config: > name: compacted-topic > settings: delete.retention.ms=0 > Leader: broker 1 > ISR: broker 1, broker 2, broker 3 > > Producer 1 writes a record `1:foo` > Producer 1 writes a record `2:bar` > broker 1 goes offline > broker 2 takes over leadership > Producer 1 writes a tombstone `1:NULL` > broker 2 compacts the topic, which leaves the topic with `1:NULL` and > `2:bar` in it. > broker 2 removes the tombstone leaving just `2:bar` in the topic. > broker 1 comes back online, catches up with replication, takes back > leadership > broker 1 now has `1:foo` and `2:bar` as the data, since the tombstone is > deleted > At this point the topic is in a strange state, as the brokers have > conflicting data. > > > Suggestion: > I believe this to be quite a hard problem to solve, so I'm not going to > suggest any large changes to the codebase, but I think a warning in the docs > about `delete.retention.ms` is warranted. > I think adding something that calls out that brokers are also consumers > here: > [https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#topicconfigs_delete.retention.ms] > would be helpful, but even further documentation about what happens when a > broker is offline for more than `delete.retention.ms` would be nice to see. > If it helps I'm happy to take a first draft at updating the docs as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)