Hi all, As part of my work on formally verifying different parts of Apache Kafka and working on KIP-966 I have built up a lot of knowledge about how the replication protocol works. Currently it is mostly documented across various KIPs and in the code itself. I have written a complete protocol description (with KIP-966 changes applied) which is inspired by the precise but accessible style and language of the Raft paper. The idea is that it makes it easier for contributors and anyone else interested in the protocol to learn how it works, the fundamental properties it has and how those properties are supported by the various behaviors and conditions.
It currently resides next to the TLA+ specification itself in my kafka-tlaplus repository. I'd be interested to receive feedback from the community. https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md Thanks Jack