If you want to understand how the replication protocol works, how it can be configured for consistency, how it can be configured for availability then I have written up a more formal description of the protocol and written TLA+ specifications. These should answer most of your questions and if not, then please do come back and ask further questions.
How it works today: - Formal description: https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/v3.5/description/0_kafka_replication_protocol.md - TLA+ specification: https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/v3.5/kafka_replication_v3_5.tla How it will work when KIP-966 is implemented: - Formal description: https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md - TLA+ specification: https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/kafka_replication_kip_966.tla Hope that helps Jack On Wed, Nov 22, 2023 at 6:27 AM De Gao <d...@live.co.uk> wrote: > Looks like the core of the problem should still be the juggling game of > consistency, availability and partition tolerance. If we want the cluster > still work when brokers have inconsistent information due to network > partition, we have to choose between consistency and availability. > My proposal is not about fix the message loss. Will share when ready. > Thanks Andrew. > ________________________________ > From: Andrew Grant <andrewgrant...@gmail.com> > Sent: 21 November 2023 12:35 > To: dev@kafka.apache.org <dev@kafka.apache.org> > Subject: Re: How Kafka handle partition leader change? > > Hey De Gao, > > Message loss or duplication can actually happen even without a leadership > change for a partition. For example if there are network issues and the > producer never gets the ack from the server, it’ll retry and cause > duplicates. Message loss can usually occur when you use acks=1 config - > mostly you’d lose after a leadership change but in theory if the leader was > restarted, the page cache was lost and it stayed leader again we could lose > the message if it wasn’t replicated soon enough. > > You might be right it’s more likely to occur during leadership change > though - not 100% sure myself on that. > > Point being, the idempotent producer really is the way to write once and > only once as far as I’m aware. > > If you have any suggestions for improvements I’m sure the community would > love to hear them! It’s possible there are ways to make leadership changes > more seamless and at least reduce the probability of duplicates or loss. > Not sure myself. I’ve wondered before if the older leader could reroute > messages for a small period of time until the client knew the new leader > for example. > > Andrew > > Sent from my iPhone > > > On Nov 21, 2023, at 1:42 AM, De Gao <d...@live.co.uk> wrote: > > > > I am asking this because I want to propose a change to Kafka. But looks > like in certain scenario it is very hard to not loss or duplication > messages. Wonder in what scenario we can accept that and where to draw the > line? > > > > ________________________________ > > From: De Gao <d...@live.co.uk> > > Sent: 21 November 2023 6:25 > > To: dev@kafka.apache.org <dev@kafka.apache.org> > > Subject: Re: How Kafka handle partition leader change? > > > > Thanks Andrew. Sounds like the leadership change from Kafka side is a > 'best effort' to avoid message duplicate or loss. Can we say that message > lost is very likely during leadership change unless producer uses > idempotency? Is this a generic situation that no intent to provide data > integration guarantee upon metadata change? > > ________________________________ > > From: Andrew Grant <agr...@confluent.io.INVALID> > > Sent: 20 November 2023 12:26 > > To: dev@kafka.apache.org <dev@kafka.apache.org> > > Subject: Re: How Kafka handle partition leader change? > > > > Hey De Gao, > > > > The controller is the one that always elects a new leader. When that > happens that metadata is changed on the controller and once committed it’s > broadcast to all brokers in the cluster. In KRaft this would be via a > PartitonChange record that each broker will fetch from the controller. In > ZK it’d be via an RPC from the controller to the broker. > > > > In either case each broker might get the notification at a different > time. No ordering guarantee among the brokers. But eventually they’ll all > know the new leader which means eventually the Produce will fail with > NotLeader and the client will refresh its metadata and find out the new one. > > > > In between all that leadership movement, there are various ways messages > can get duplicated or lost. However if you use the idempotent producer I > believe you actually won’t see dupes or missing messages so if that’s an > important requirement you could look into that. The producer is designed to > retry in general and when you use the idempotent producer some extra > metadata is sent around to dedupe any messages server-side that were sent > multiple times by the client. > > > > If you’re interested in learning more Kafka internals I highly recommend > this blog series > https://www.google.com/url?q=https://www.confluent.io/blog/apache-kafka-architecture-and-internals-by-jun-rao/&source=gmail-imap&ust=1701153737000000&usg=AOvVaw1Bnr9YgbxvIt1NJmgdFzn5 > > > > Hope that helped a bit. > > > > Andy > > > > Sent from my iPhone > > > >> On Nov 20, 2023, at 2:07 AM, De Gao <d...@live.co.uk> wrote: > >> > >> Hi all I have a interesting question here. > >> > >> Let's say we have 2 broker B1 B2, controller C and producer P1, > P2...Pn. Currently B1 holds the partition leader and Px is constantly > producing messages to B1. We want to move the partition leadership to B2. > How does the leadership change synced between B1, B2, C, and Px that it is > guaranteed that all the parties acknowledged the leadership change in the > right order? Was there a break of produce flow in between? Any chance of > message lost? > >> > >> Thanks > >> > >> De Gao >