Jason Gustafson created KAFKA-5490:
--------------------------------------
Summary: Deletion of tombstones during cleaning should consider
idempotent message retention
Key: KAFKA-5490
URL: https://issues.apache.org/jira/browse/KAFKA-5490
Project: Kafka
Issue Type: Sub-task
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Critical
Fix For: 0.11.0.1
The LogCleaner always preserves the message containing last sequence from a
given ProducerId when doing a round of cleaning. This is necessary to ensure
that the producer is not prematurely evicted which would cause an
OutOfOrderSequenceException. The problem with this approach is that the
preserved message won't be considered again for cleaning until a new message
with the same key is written to the topic. Generally this could result in
accumulation of stale entries in the log, but the bigger problem is that the
newer entry with the same key could be a tombstone. If we end up deleting this
tombstone before a new record with the same key is written, then the old entry
will resurface. For example, suppose the following sequence of writes:
1. ProducerId=1, Key=A, Value=1
2. ProducerId=2, Key=A, Value=null (tombstone)
We will preserve the first entry indefinitely until a new record with Key=A is
written AND either ProducerId 1 has written a newer record with a larger
sequence number or ProducerId 1 becomes expired. As long as the tombstone is
preserved, there is no correctness violation: a consumer reading from the
beginning will ignore the first entry after reading the tombstone. But it is
possible that the tombstone entry will be removed from the log before a new
record with Key=A is written. If that happens, then a consumer reading from the
beginning would incorrectly observe the overwritten value.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)