[ 
https://issues.apache.org/jira/browse/KAFKA-12378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370369#comment-17370369
 ] 

Guozhang Wang commented on KAFKA-12378:
---------------------------------------

Hello [~shanesaww], I agree this is a tricky problem to solve -- when 
discussing about KRaft we've also encountered similar challenges as well.

Besides warning users via docs, another idea we've talked about before is 
basically, require a broker after resuming if it has been offline for longer 
than retention time, to consider its local log as not "safe" anymore and tries 
to bootstrap the local log from leaders from scratch. Admittedly that's not 
very efficient either, but given that in practice retention should be 
reasonably long, and hence this scenario may not be common, maybe this is 
viable too. I'd like to invite [~hachikuji] and [~jsancio] to chime in here too.

> If a broker is down for more then `delete.retention.ms` deleted records in a 
> compacted topic can come back.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12378
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12378
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Shane
>            Priority: Major
>
> If the leader of a compacted topic goes offline, or has replication lag 
> longer than the `delete.retention.ms` of a topic, records that are tombstoned 
> can come back once the leader catches up then becomes the leader.
>  
> Example of this happening:
>  Topic config:
>     name: compacted-topic
>     settings: delete.retention.ms=0
>     Leader: broker 1
>     ISR: broker 1, broker 2, broker 3
>  
> Producer 1 writes a record `1:foo` 
>  Producer 1 writes a record `2:bar` 
>  broker 1 goes offline 
>  broker 2 takes over leadership
>  Producer 1 writes a tombstone `1:NULL`
>  broker 2 compacts the topic, which leaves the topic with `1:NULL` and 
> `2:bar` in it.
>  broker 2 removes the tombstone leaving just `2:bar` in the topic.
>  broker 1 comes back online, catches up with replication, takes back 
> leadership
>  broker 1 now has `1:foo` and `2:bar` as the data, since the tombstone is 
> deleted
> At this point the topic is in a strange state, as the brokers have 
> conflicting data.
>  
>  
> Suggestion:
>  I believe this to be quite a hard problem to solve, so I'm not going to 
> suggest any large changes to the codebase, but I think a warning in the docs 
> about `delete.retention.ms` is warranted.
>  I think adding something that calls out that brokers are also consumers 
> here: 
> [https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html#topicconfigs_delete.retention.ms]
>  would be helpful, but even further documentation about what happens when a 
> broker is offline for more than `delete.retention.ms` would be nice to see. 
> If it helps I'm happy to take a first draft at updating the docs as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to