If you want to understand how the replication protocol works, how it can be
configured for consistency, how it can be configured for availability then
I have written up a more formal description of the protocol and written
TLA+ specifications. These should answer most of your questions and if not,
then please do come back and ask further questions.

How it works today:
- Formal description:
https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/v3.5/description/0_kafka_replication_protocol.md
- TLA+ specification:
https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/v3.5/kafka_replication_v3_5.tla

How it will work when KIP-966 is implemented:
- Formal description:
https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
- TLA+ specification:
https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/kafka_replication_kip_966.tla

Hope that helps
Jack


On Wed, Nov 22, 2023 at 6:27 AM De Gao <d...@live.co.uk> wrote:

> Looks like the core of the problem should still be the juggling game of
> consistency, availability and partition tolerance.  If we want the cluster
> still work when brokers have inconsistent information due to network
> partition, we have to choose between consistency and availability.
> My proposal is not about fix the message loss. Will share when ready.
> Thanks Andrew.
> ________________________________
> From: Andrew Grant <andrewgrant...@gmail.com>
> Sent: 21 November 2023 12:35
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: Re: How Kafka handle partition leader change?
>
> Hey De Gao,
>
> Message loss or duplication can actually happen even without a leadership
> change for a partition. For example if there are network issues and the
> producer never gets the ack from the server, it’ll retry and cause
> duplicates. Message loss can usually occur when you use acks=1 config -
> mostly you’d lose after a leadership change but in theory if the leader was
> restarted, the page cache was lost and it stayed leader again we could lose
> the message if it wasn’t replicated soon enough.
>
> You might be right it’s more likely to occur during leadership change
> though - not 100% sure myself on that.
>
> Point being, the idempotent producer really is the way to write once and
> only once as far as I’m aware.
>
> If you have any suggestions for improvements I’m sure the community would
> love to hear them! It’s possible there are ways to make leadership changes
> more seamless and at least reduce the probability of duplicates or loss.
> Not sure myself. I’ve wondered before if the older leader could reroute
> messages for a small period of time until the client knew the new leader
> for example.
>
> Andrew
>
> Sent from my iPhone
>
> > On Nov 21, 2023, at 1:42 AM, De Gao <d...@live.co.uk> wrote:
> >
> > I am asking this because I want to propose a change to Kafka. But looks
> like in certain scenario it is very hard to not loss or duplication
> messages. Wonder in what scenario we can accept that and where to draw the
> line?
> >
> > ________________________________
> > From: De Gao <d...@live.co.uk>
> > Sent: 21 November 2023 6:25
> > To: dev@kafka.apache.org <dev@kafka.apache.org>
> > Subject: Re: How Kafka handle partition leader change?
> >
> > Thanks Andrew.  Sounds like the leadership change from Kafka side is a
> 'best effort' to avoid message duplicate or loss. Can we say that message
> lost is very likely during leadership change unless producer uses
> idempotency? Is this a generic situation that no intent to provide data
> integration guarantee upon metadata change?
> > ________________________________
> > From: Andrew Grant <agr...@confluent.io.INVALID>
> > Sent: 20 November 2023 12:26
> > To: dev@kafka.apache.org <dev@kafka.apache.org>
> > Subject: Re: How Kafka handle partition leader change?
> >
> > Hey De Gao,
> >
> > The controller is the one that always elects a new leader. When that
> happens that metadata is changed on the controller and once committed it’s
> broadcast to all brokers in the cluster. In KRaft this would be via a
> PartitonChange record that each broker will fetch from the controller. In
> ZK it’d be via an RPC from the controller to the broker.
> >
> > In either case each broker might get the notification at a different
> time. No ordering guarantee among the brokers. But eventually they’ll all
> know the new leader which means eventually the Produce will fail with
> NotLeader and the client will refresh its metadata and find out the new one.
> >
> > In between all that leadership movement, there are various ways messages
> can get duplicated or lost. However if you use the idempotent producer I
> believe you actually won’t see dupes or missing messages so if that’s an
> important requirement you could look into that. The producer is designed to
> retry in general and when you use the idempotent producer some extra
> metadata is sent around to dedupe any messages server-side that were sent
> multiple times by the client.
> >
> > If you’re interested in learning more Kafka internals I highly recommend
> this blog series
> https://www.google.com/url?q=https://www.confluent.io/blog/apache-kafka-architecture-and-internals-by-jun-rao/&source=gmail-imap&ust=1701153737000000&usg=AOvVaw1Bnr9YgbxvIt1NJmgdFzn5
> >
> > Hope that helped a bit.
> >
> > Andy
> >
> > Sent from my iPhone
> >
> >> On Nov 20, 2023, at 2:07 AM, De Gao <d...@live.co.uk> wrote:
> >>
> >> Hi all I have a interesting question here.
> >>
> >> Let's say we have 2 broker B1 B2, controller C and producer P1,
> P2...Pn. Currently B1 holds the partition leader and Px is constantly
> producing messages to B1. We want to move the partition leadership to B2.
> How does the leadership change synced between B1, B2, C, and Px that it is
> guaranteed that all the parties acknowledged the leadership change in the
> right order? Was there a break of produce flow in between? Any chance of
> message lost?
> >>
> >> Thanks
> >>
> >> De Gao
>

Reply via email to