Re: Complete Kafka replication protocol description

2023-09-12 Thread Federico Valeri
Hi Jack, thanks for providing this material. I'll spend some time
digging into it.

On Mon, Sep 11, 2023 at 6:56 PM Jack Vanlightly  wrote:
>
> Hi all,
>
> I agree that we should have the protocol description and specification in
> the Kafka repository. I have two other specifications to contribute
> including the KRaft implementation of Raft and the next gen consumer group
> protocol (WIP). I also have a formal prose description of the Kafka
> replication protocol for how it works today. That should appear under the
> 3.5 directory sometime this week.
>
> Regarding moving this kind of stuff into the Kafka repo, there are a number
> of additional topics such as:
>
>- Where to place formal prose descriptions and specifications of
>protocols in the Kafka repo.

I'm thinking about a dedicated folder (i.e. tlaplus) with a main
README.md where we briefly explain what's contained.

>- Given we may add more specifications and potentially other formal
>prose descriptions, how to structure things?

Each spec would live in its own sub folder under specs, along with a
README.md containing the prose description. I think we should also
provide the configuration files, and a bash script to easily run the
model checker on them. Wdyt?

>- Do we use markdown like I have done here with svg files (created in
>Excalidraw with the Excalidraw bits embedded in the svg)., or do we use
>LaTex? Some people may wish to read this directly on GitHub and perhaps
>others like a PDF. Markdown is more accessible and perhaps more likely to
>fulfil the living document approach.

I agree that Markdown is more convenient and will increase the chance
that these specifications will receive updates.

>
> Thanks
> Jack
>
>
> On Mon, Sep 11, 2023 at 1:45 PM Divij Vaidya 
> wrote:
>
> > This is very useful Jack!
> >
> > 1. We are missing a replication protocol specification in our project.
> > Ideally it should be a living document and adding what you wrote to
> > the docs/design would be a great start towards that goal.
> > 2. I have also been building on top of some existing TLA+ to add
> > changes to replication protocol brought by features such as Tiered
> > Storage, local retention etc. at
> >
> > https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
> > 3. Apart from verifying correctness, I believe a TLA+ model will also
> > help developers quickly iterate through their fundamental assumptions
> > while making changes. As an example, we recently had a discussion in
> > the community on whether to increase leader epoch with shrink/expand
> > ISR or not. There was another discussion on whether we can choose the
> > replica with the largest end offset as the new leader to reduce
> > truncation. In both these cases, we could have quickly modified the
> > existing TLA+ model, run through it and verify that the assumptions
> > still hold true. It would be great if we can take such discussions as
> > an example and demonstrate how TLA+ could have benefitted the
> > community. It would help make the case for adding the TLA+ spec as
> > part of the community owned project itself.
> >
> > Right now, things are a bit busy on my end, but I am looking forward
> > to exploring what you shared above in the coming weeks (perhaps a
> > first review by end of sept).
> >
> > Thank you again for starting this conversation.
> >
> > --
> > Divij Vaidya
> >
> > On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada  wrote:
> > >
> > > Hi Jack,
> > >
> > > Thank you for the great work, not only the spec but also for the
> > > comprehensive documentation about the replication.
> > > Actually I wrote some TLA+ spec to verify unclean leader election
> > behavior
> > > before so I will double-check my understanding with your complete spec :)
> > >
> > >
> > > Thanks,
> > >
> > > 2023年9月10日(日) 21:42 David Jacot :
> > >
> > > > Hi Jack,
> > > >
> > > > This is great! Thanks for doing it. I will look into it when I have a
> > bit
> > > > of time, likely after Current.
> > > >
> > > > Would you be interested in contributing it to the main repository? That
> > > > would be a great contribution to the project. Having it there would
> > allow
> > > > the community to maintain it while changes to the protocol are made.
> > That
> > > > would also pave the way for having other specs in the future (e.g. new
> > > > consumer group protocol).
> > > >
> > > > Best,
> > > > David
> > > >
> > > > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly 
> > a
> > > > écrit :
> > > >
> > > > > Hi all,
> > > > >
> > > > > As part of my work on formally verifying different parts of Apache
> > Kafka
> > > > > and working on KIP-966 I have built up a lot of knowledge about how
> > the
> > > > > replication protocol works. Currently it is mostly documented across
> > > > > various KIPs and in the code itself. I have written a complete
> > protocol
> > > > > description (with KIP-966 changes applied) which is inspired by the
> > > > precise
> > 

Re: Complete Kafka replication protocol description

2023-09-11 Thread Jack Vanlightly
Hi all,

I agree that we should have the protocol description and specification in
the Kafka repository. I have two other specifications to contribute
including the KRaft implementation of Raft and the next gen consumer group
protocol (WIP). I also have a formal prose description of the Kafka
replication protocol for how it works today. That should appear under the
3.5 directory sometime this week.

Regarding moving this kind of stuff into the Kafka repo, there are a number
of additional topics such as:

   - Where to place formal prose descriptions and specifications of
   protocols in the Kafka repo.
   - Given we may add more specifications and potentially other formal
   prose descriptions, how to structure things?
   - Do we use markdown like I have done here with svg files (created in
   Excalidraw with the Excalidraw bits embedded in the svg)., or do we use
   LaTex? Some people may wish to read this directly on GitHub and perhaps
   others like a PDF. Markdown is more accessible and perhaps more likely to
   fulfil the living document approach.

Thanks
Jack


On Mon, Sep 11, 2023 at 1:45 PM Divij Vaidya 
wrote:

> This is very useful Jack!
>
> 1. We are missing a replication protocol specification in our project.
> Ideally it should be a living document and adding what you wrote to
> the docs/design would be a great start towards that goal.
> 2. I have also been building on top of some existing TLA+ to add
> changes to replication protocol brought by features such as Tiered
> Storage, local retention etc. at
>
> https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
> 3. Apart from verifying correctness, I believe a TLA+ model will also
> help developers quickly iterate through their fundamental assumptions
> while making changes. As an example, we recently had a discussion in
> the community on whether to increase leader epoch with shrink/expand
> ISR or not. There was another discussion on whether we can choose the
> replica with the largest end offset as the new leader to reduce
> truncation. In both these cases, we could have quickly modified the
> existing TLA+ model, run through it and verify that the assumptions
> still hold true. It would be great if we can take such discussions as
> an example and demonstrate how TLA+ could have benefitted the
> community. It would help make the case for adding the TLA+ spec as
> part of the community owned project itself.
>
> Right now, things are a bit busy on my end, but I am looking forward
> to exploring what you shared above in the coming weeks (perhaps a
> first review by end of sept).
>
> Thank you again for starting this conversation.
>
> --
> Divij Vaidya
>
> On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada  wrote:
> >
> > Hi Jack,
> >
> > Thank you for the great work, not only the spec but also for the
> > comprehensive documentation about the replication.
> > Actually I wrote some TLA+ spec to verify unclean leader election
> behavior
> > before so I will double-check my understanding with your complete spec :)
> >
> >
> > Thanks,
> >
> > 2023年9月10日(日) 21:42 David Jacot :
> >
> > > Hi Jack,
> > >
> > > This is great! Thanks for doing it. I will look into it when I have a
> bit
> > > of time, likely after Current.
> > >
> > > Would you be interested in contributing it to the main repository? That
> > > would be a great contribution to the project. Having it there would
> allow
> > > the community to maintain it while changes to the protocol are made.
> That
> > > would also pave the way for having other specs in the future (e.g. new
> > > consumer group protocol).
> > >
> > > Best,
> > > David
> > >
> > > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly 
> a
> > > écrit :
> > >
> > > > Hi all,
> > > >
> > > > As part of my work on formally verifying different parts of Apache
> Kafka
> > > > and working on KIP-966 I have built up a lot of knowledge about how
> the
> > > > replication protocol works. Currently it is mostly documented across
> > > > various KIPs and in the code itself. I have written a complete
> protocol
> > > > description (with KIP-966 changes applied) which is inspired by the
> > > precise
> > > > but accessible style and language of the Raft paper. The idea is
> that it
> > > > makes it easier for contributors and anyone else interested in the
> > > protocol
> > > > to learn how it works, the fundamental properties it has and how
> those
> > > > properties are supported by the various behaviors and conditions.
> > > >
> > > > It currently resides next to the TLA+ specification itself in my
> > > > kafka-tlaplus repository. I'd be interested to receive feedback from
> the
> > > > community.
> > > >
> > > >
> > > >
> > >
> https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > > >
> > > > Thanks
> > > > Jack
> > > >
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 

Re: Complete Kafka replication protocol description

2023-09-11 Thread Divij Vaidya
This is very useful Jack!

1. We are missing a replication protocol specification in our project.
Ideally it should be a living document and adding what you wrote to
the docs/design would be a great start towards that goal.
2. I have also been building on top of some existing TLA+ to add
changes to replication protocol brought by features such as Tiered
Storage, local retention etc. at
https://github.com/divijvaidya/kafka-specification/blob/master/KafkaReplication.tla
3. Apart from verifying correctness, I believe a TLA+ model will also
help developers quickly iterate through their fundamental assumptions
while making changes. As an example, we recently had a discussion in
the community on whether to increase leader epoch with shrink/expand
ISR or not. There was another discussion on whether we can choose the
replica with the largest end offset as the new leader to reduce
truncation. In both these cases, we could have quickly modified the
existing TLA+ model, run through it and verify that the assumptions
still hold true. It would be great if we can take such discussions as
an example and demonstrate how TLA+ could have benefitted the
community. It would help make the case for adding the TLA+ spec as
part of the community owned project itself.

Right now, things are a bit busy on my end, but I am looking forward
to exploring what you shared above in the coming weeks (perhaps a
first review by end of sept).

Thank you again for starting this conversation.

--
Divij Vaidya

On Mon, Sep 11, 2023 at 4:49 AM Haruki Okada  wrote:
>
> Hi Jack,
>
> Thank you for the great work, not only the spec but also for the
> comprehensive documentation about the replication.
> Actually I wrote some TLA+ spec to verify unclean leader election behavior
> before so I will double-check my understanding with your complete spec :)
>
>
> Thanks,
>
> 2023年9月10日(日) 21:42 David Jacot :
>
> > Hi Jack,
> >
> > This is great! Thanks for doing it. I will look into it when I have a bit
> > of time, likely after Current.
> >
> > Would you be interested in contributing it to the main repository? That
> > would be a great contribution to the project. Having it there would allow
> > the community to maintain it while changes to the protocol are made. That
> > would also pave the way for having other specs in the future (e.g. new
> > consumer group protocol).
> >
> > Best,
> > David
> >
> > Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly  a
> > écrit :
> >
> > > Hi all,
> > >
> > > As part of my work on formally verifying different parts of Apache Kafka
> > > and working on KIP-966 I have built up a lot of knowledge about how the
> > > replication protocol works. Currently it is mostly documented across
> > > various KIPs and in the code itself. I have written a complete protocol
> > > description (with KIP-966 changes applied) which is inspired by the
> > precise
> > > but accessible style and language of the Raft paper. The idea is that it
> > > makes it easier for contributors and anyone else interested in the
> > protocol
> > > to learn how it works, the fundamental properties it has and how those
> > > properties are supported by the various behaviors and conditions.
> > >
> > > It currently resides next to the TLA+ specification itself in my
> > > kafka-tlaplus repository. I'd be interested to receive feedback from the
> > > community.
> > >
> > >
> > >
> > https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> > >
> > > Thanks
> > > Jack
> > >
> >
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 


Re: Complete Kafka replication protocol description

2023-09-10 Thread Haruki Okada
Hi Jack,

Thank you for the great work, not only the spec but also for the
comprehensive documentation about the replication.
Actually I wrote some TLA+ spec to verify unclean leader election behavior
before so I will double-check my understanding with your complete spec :)


Thanks,

2023年9月10日(日) 21:42 David Jacot :

> Hi Jack,
>
> This is great! Thanks for doing it. I will look into it when I have a bit
> of time, likely after Current.
>
> Would you be interested in contributing it to the main repository? That
> would be a great contribution to the project. Having it there would allow
> the community to maintain it while changes to the protocol are made. That
> would also pave the way for having other specs in the future (e.g. new
> consumer group protocol).
>
> Best,
> David
>
> Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly  a
> écrit :
>
> > Hi all,
> >
> > As part of my work on formally verifying different parts of Apache Kafka
> > and working on KIP-966 I have built up a lot of knowledge about how the
> > replication protocol works. Currently it is mostly documented across
> > various KIPs and in the code itself. I have written a complete protocol
> > description (with KIP-966 changes applied) which is inspired by the
> precise
> > but accessible style and language of the Raft paper. The idea is that it
> > makes it easier for contributors and anyone else interested in the
> protocol
> > to learn how it works, the fundamental properties it has and how those
> > properties are supported by the various behaviors and conditions.
> >
> > It currently resides next to the TLA+ specification itself in my
> > kafka-tlaplus repository. I'd be interested to receive feedback from the
> > community.
> >
> >
> >
> https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> >
> > Thanks
> > Jack
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: Complete Kafka replication protocol description

2023-09-10 Thread David Jacot
Hi Jack,

This is great! Thanks for doing it. I will look into it when I have a bit
of time, likely after Current.

Would you be interested in contributing it to the main repository? That
would be a great contribution to the project. Having it there would allow
the community to maintain it while changes to the protocol are made. That
would also pave the way for having other specs in the future (e.g. new
consumer group protocol).

Best,
David

Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly  a
écrit :

> Hi all,
>
> As part of my work on formally verifying different parts of Apache Kafka
> and working on KIP-966 I have built up a lot of knowledge about how the
> replication protocol works. Currently it is mostly documented across
> various KIPs and in the code itself. I have written a complete protocol
> description (with KIP-966 changes applied) which is inspired by the precise
> but accessible style and language of the Raft paper. The idea is that it
> makes it easier for contributors and anyone else interested in the protocol
> to learn how it works, the fundamental properties it has and how those
> properties are supported by the various behaviors and conditions.
>
> It currently resides next to the TLA+ specification itself in my
> kafka-tlaplus repository. I'd be interested to receive feedback from the
> community.
>
>
> https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
>
> Thanks
> Jack
>


Complete Kafka replication protocol description

2023-09-10 Thread Jack Vanlightly
Hi all,

As part of my work on formally verifying different parts of Apache Kafka
and working on KIP-966 I have built up a lot of knowledge about how the
replication protocol works. Currently it is mostly documented across
various KIPs and in the code itself. I have written a complete protocol
description (with KIP-966 changes applied) which is inspired by the precise
but accessible style and language of the Raft paper. The idea is that it
makes it easier for contributors and anyone else interested in the protocol
to learn how it works, the fundamental properties it has and how those
properties are supported by the various behaviors and conditions.

It currently resides next to the TLA+ specification itself in my
kafka-tlaplus repository. I'd be interested to receive feedback from the
community.

https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md

Thanks
Jack