Hi all,

I agree that we should have the protocol description and specification in
the Kafka repository. I have two other specifications to contribute
including the KRaft implementation of Raft and the next gen consumer group
protocol (WIP). I also have a formal prose description of the Kafka
replication protocol for how it works today. That should appear under the
3.5 directory sometime this week.

Regarding moving this kind of stuff into the Kafka repo, there are a number
of additional topics such as:

   - Where to place formal prose descriptions and specifications of
   protocols in the Kafka repo.
   - Given we may add more specifications and potentially other formal
   prose descriptions, how to structure things?
   - Do we use markdown like I have done here with svg files (created in
   Excalidraw with the Excalidraw bits embedded in the svg)., or do we use
   LaTex? Some people may wish to read this directly on GitHub and perhaps
   others like a PDF. Markdown is more accessible and perhaps more likely to
   fulfil the living document approach.

Thanks
Jack


On Mon, Sep 11, 2023 at 1:45 PM Divij Vaidya <divijvaidy...@gmail.com>
wrote:

> This is very useful Jack!
>
> 1. We are missing a replication protocol specification in our project.
> Ideally it should be a living document and adding what you wrote to
> the docs/design would be a great start towards that goal.
> 2. I have also been building on top of some existing TLA+ to add
> changes to replication protocol brought by features such as Tiered
> Storage, local retention etc. at
>
> https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
> 3. Apart from verifying correctness, I believe a TLA+ model will also
> help developers quickly iterate through their fundamental assumptions
> while making changes. As an example, we recently had a discussion in
> the community on whether to increase leader epoch with shrink/expand
> ISR or not. There was another discussion on whether we can choose the
> replica with the largest end offset as the new leader to reduce
> truncation. In both these cases, we could have quickly modified the
> existing TLA+ model, run through it and verify that the assumptions
> still hold true. It would be great if we can take such discussions as
> an example and demonstrate how TLA+ could have benefitted the
> community. It would help make the case for adding the TLA+ spec as
> part of the community owned project itself.
>
> Right now, things are a bit busy on my end, but I am looking forward
> to exploring what you shared above in the coming weeks (perhaps a
> first review by end of sept).
>
> Thank you again for starting this conversation.
>
> --
> Divij Vaidya
>
> On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada <ocadar...@gmail.com> wrote:
> >
> > Hi Jack,
> >
> > Thank you for the great work, not only the spec but also for the
> > comprehensive documentation about the replication.
> > Actually I wrote some TLA+ spec to verify unclean leader election
> behavior
> > before so I will double-check my understanding with your complete spec :)
> >
> >
> > Thanks,
> >
> > 2023年9月10日(日) 21:42 David Jacot <da...@apache.org>:
> >
> > > Hi Jack,
> > >
> > > This is great! Thanks for doing it. I will look into it when I have a
> bit
> > > of time, likely after Current.
> > >
> > > Would you be interested in contributing it to the main repository? That
> > > would be a great contribution to the project. Having it there would
> allow
> > > the community to maintain it while changes to the protocol are made.
> That
> > > would also pave the way for having other specs in the future (e.g. new
> > > consumer group protocol).
> > >
> > > Best,
> > > David
> > >
> > > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly <vanligh...@apache.org>
> a
> > > écrit :
> > >
> > > > Hi all,
> > > >
> > > > As part of my work on formally verifying different parts of Apache
> Kafka
> > > > and working on KIP-966 I have built up a lot of knowledge about how
> the
> > > > replication protocol works. Currently it is mostly documented across
> > > > various KIPs and in the code itself. I have written a complete
> protocol
> > > > description (with KIP-966 changes applied) which is inspired by the
> > > precise
> > > > but accessible style and language of the Raft paper. The idea is
> that it
> > > > makes it easier for contributors and anyone else interested in the
> > > protocol
> > > > to learn how it works, the fundamental properties it has and how
> those
> > > > properties are supported by the various behaviors and conditions.
> > > >
> > > > It currently resides next to the TLA+ specification itself in my
> > > > kafka-tlaplus repository. I'd be interested to receive feedback from
> the
> > > > community.
> > > >
> > > >
> > > >
> > >
> https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > > >
> > > > Thanks
> > > > Jack
> > > >
> > >
> >
> >
> > --
> > ========================
> > Okada Haruki
> > ocadar...@gmail.com
> > ========================
>

Reply via email to