This is very useful Jack!

1. We are missing a replication protocol specification in our project.
Ideally it should be a living document and adding what you wrote to
the docs/design would be a great start towards that goal.
2. I have also been building on top of some existing TLA+ to add
changes to replication protocol brought by features such as Tiered
Storage, local retention etc. at
https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
3. Apart from verifying correctness, I believe a TLA+ model will also
help developers quickly iterate through their fundamental assumptions
while making changes. As an example, we recently had a discussion in
the community on whether to increase leader epoch with shrink/expand
ISR or not. There was another discussion on whether we can choose the
replica with the largest end offset as the new leader to reduce
truncation. In both these cases, we could have quickly modified the
existing TLA+ model, run through it and verify that the assumptions
still hold true. It would be great if we can take such discussions as
an example and demonstrate how TLA+ could have benefitted the
community. It would help make the case for adding the TLA+ spec as
part of the community owned project itself.

Right now, things are a bit busy on my end, but I am looking forward
to exploring what you shared above in the coming weeks (perhaps a
first review by end of sept).

Thank you again for starting this conversation.

--
Divij Vaidya

On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada <ocadar...@gmail.com> wrote:
>
> Hi Jack,
>
> Thank you for the great work, not only the spec but also for the
> comprehensive documentation about the replication.
> Actually I wrote some TLA+ spec to verify unclean leader election behavior
> before so I will double-check my understanding with your complete spec :)
>
>
> Thanks,
>
> 2023年9月10日(日) 21:42 David Jacot <da...@apache.org>:
>
> > Hi Jack,
> >
> > This is great! Thanks for doing it. I will look into it when I have a bit
> > of time, likely after Current.
> >
> > Would you be interested in contributing it to the main repository? That
> > would be a great contribution to the project. Having it there would allow
> > the community to maintain it while changes to the protocol are made. That
> > would also pave the way for having other specs in the future (e.g. new
> > consumer group protocol).
> >
> > Best,
> > David
> >
> > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly <vanligh...@apache.org> a
> > écrit :
> >
> > > Hi all,
> > >
> > > As part of my work on formally verifying different parts of Apache Kafka
> > > and working on KIP-966 I have built up a lot of knowledge about how the
> > > replication protocol works. Currently it is mostly documented across
> > > various KIPs and in the code itself. I have written a complete protocol
> > > description (with KIP-966 changes applied) which is inspired by the
> > precise
> > > but accessible style and language of the Raft paper. The idea is that it
> > > makes it easier for contributors and anyone else interested in the
> > protocol
> > > to learn how it works, the fundamental properties it has and how those
> > > properties are supported by the various behaviors and conditions.
> > >
> > > It currently resides next to the TLA+ specification itself in my
> > > kafka-tlaplus repository. I'd be interested to receive feedback from the
> > > community.
> > >
> > >
> > >
> > https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > >
> > > Thanks
> > > Jack
> > >
> >
>
>
> --
> ========================
> Okada Haruki
> ocadar...@gmail.com
> ========================

Reply via email to