Justine Olshan created KAFKA-19446: -------------------------------------- Summary: TV2 late marker can violate EOS guarantees. Key: KAFKA-19446 URL: https://issues.apache.org/jira/browse/KAFKA-19446 Project: Kafka Issue Type: Task Affects Versions: 4.0.0, 4.1.0 Reporter: Justine Olshan Assignee: Justine Olshan
One case we missed in KIP-890 is if a late arriving WriteTxnMarkerRequest comes in to a partition for a transaction using TV2. Because we write a marker with epoch +1, we send the request with epoch +1. Due to the somewhat relaxed check on epochs at the log layer ([https://github.com/apache/kafka/blob/fd70290633191b6f53a9d4ddb24e3a8b619fcd3f/storage/src/main/java/org/apache/kafka/storage/internals/log/ProducerAppendInfo.java#L211)] , we can actually accept a late arriving request for the previous transaction since the epoch will be the same. We should tighten up this check to not allow the same epoch when using TV2. In other words, the marker should always be >= epoch + 1 the current producer state epoch. (The epoch can be greater than +1 if we restart the producer and bump epoch.) We just need a good way to tell if a marker is meant for a TV2 transaction. This + 1 works even if we didn't produce records, since the previous marker will update the epoch -- This message was sent by Atlassian Jira (v8.20.10#820010)