Hi Ismael,

I can volunteer to write the KIP. Unless somebody else has any objections,
I'll get to write it by the end of this week.

Best,

Josep Prat
Open Source Engineering Director, aivenjosep.p...@aiven.io   |
+491715557497 | aiven.io
Aiven Deutschland GmbH
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

On Thu, Dec 21, 2023, 17:58 Ismael Juma <m...@ismaeljuma.com> wrote:

> Hi all,
>
> After understanding the use case Josep and Anton described in more detail,
> I think it's fair to say that quorum reconfiguration is necessary for
> migration of Apache Kafka users who follow this pattern. Given that, I
> think we should have a 3.8 release before the 4.0 release.
>
> The next question is whether we should do something special when it comes
> to timeline, parallel releases, etc. After careful consideration, I think
> we should simply follow our usual approach: regular 3.8 release around
> early May 2024 and regular 4.0 release around early September 2024. The
> community will be able to start working on items specific to 4.0 after 3.8
> is branched in late March/early April - I don't think we need to deal with
> the overhead of maintaining multiple long-lived branches for
> feature development.
>
> If the proposal above sounds reasonable, I suggest we write a KIP and vote
> on it. Any volunteers?
>
> Ismael
>
> On Tue, Nov 21, 2023 at 8:18 PM Ismael Juma <m...@ismaeljuma.com> wrote:
>
> > Hi Luke,
> >
> > I think we're conflating different things here. There are 3 separate
> > points in your email, but only 1 of them requires 3.8:
> >
> > 1. JBOD may have some bugs in 3.7.0. Whatever bugs exist can be fixed in
> > 3.7.x. We have already said that we will backport critical fixes to 3.7.x
> > for some time.
> > 2. Quorum reconfiguration is important to include in 4.0, the release
> > where ZK won't be supported. This doesn't need a 3.8 release either.
> > 3. Quorum reconfiguration is necessary for migration use cases and hence
> > needs to be in a 3.x release. This one would require a 3.8 release if
> true.
> > But we should have a debate on whether it is indeed true. It's not clear
> to
> > me yet.
> >
> > Ismael
> >
> > On Tue, Nov 21, 2023 at 7:30 PM Luke Chen <show...@gmail.com> wrote:
> >
> >> Hi Colin and Jose,
> >>
> >> I revisited the discussion of KIP-833 here
> >> <https://lists.apache.org/thread/90zkqvmmw3y8j6tkgbg3md78m7hs4yn6>, and
> >> you
> >> can see I'm the first one to reply to the discussion thread to express
> my
> >> excitement at that time. Till now, I personally still think having KRaft
> >> in
> >> Kafka is a good direction we have to move forward. But to move to this
> >> destination, we need to make our users comfortable with this decision.
> The
> >> worst scenario is, we said 4.0 is ready, and ZK is removed. Then, some
> >> users move to 4.0 and say, wait a minute, why does it not support xxx
> >> feature? And then start to search for other alternatives to replace
> Apache
> >> Kafka. We all don't want to see this, right? So, that's why some
> community
> >> users start to express their concern to move to 4.0 too quickly,
> including
> >> me.
> >>
> >>
> >> Quoting Colin:
> >> > While dynamic quorum reconfiguration is a nice feature, it doesn't
> block
> >> anything: not migration, not deployment.
> >>
> >> Clearly Confluent team might deploy ZooKeeper in a particular way and
> >> didn’t depend on its ability to support reconfiguration. So KRaft is
> ready
> >> from your point of view. But users of Apache Kafka might have come to
> >> depend on some ZooKeeper functionality, such as the ability to
> reconfigure
> >> ZooKeeper quorums, that is not available in KRaft, yet. I don’t think
> the
> >> Apache Kafka documentation has ever said “do not depend on this ability
> of
> >> Apache Kafka or Zookeeper”, so it doesn’t seem unreasonable for users to
> >> have deployed ZooKeeper in this way. In KIP-833
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-MissingFeatures
> >> >,
> >> we said: “Modifying certain dynamic configurations on the standalone
> KRaft
> >> controller” was an important missing feature. Unfortunately it wasn’t as
> >> explicit as it could have been. While no one expects KRaft to support
> all
> >> the features of ZooKeeper, it looks to me that users might depend on
> this
> >> particular feature and it’s only recently that it’s become apparent that
> >> you don’t consider it a blocker.
> >>
> >> Quoting José:
> >> > If we do a 3.8 release before 4.0 and we implement KIP-853 in 3.8, the
> >> user will be able to migrate to a KRaft cluster that supports
> dynamically
> >> changing the set of voters and has better support for disk failures.
> >>
> >> Yes, KIP-853 and disk failure support are both very important missing
> >> features. For the disk failure support, I don't think this is a
> >> "good-to-have-feature", it should be a "must-have" IMO. We can't
> announce
> >> the 4.0 release without a good solution for disk failure in KRaft.
> >>
> >> It’s also worth thinking about how Apache Kafka users who depend on JBOD
> >> might look at the risks of not having a 3.8 release. JBOD support on
> KRaft
> >> is planned to be added in 3.7, and is still in progress so far. So it’s
> >> hard to say it’s a blocker or not. But in practice, even if the feature
> is
> >> made into 3.7 in time, a lot of new code for this feature is unlikely to
> >> be
> >> entirely bug free. We need to maintain the confidence of those users,
> and
> >> forcing them to migrate through 3.7 where this new code is hardly
> >> battle-tested doesn’t appear to do that.
> >>
> >> Our goal for 4.0 should be that all the “main” features in KRaft are in
> >> production ready state. To reach the goal, I think having one more
> release
> >> makes sense. We can have different opinions about what the “main
> features”
> >> in KRaft are, but we should all agree, JBOD is one of them.
> >>
> >> Alternatively, like Josep proposed, we can choose to have 4.0 + 3.7.x or
> >> 3.8 releases in parallel to maintain these 2 releases for a defined
> >> period.
> >> But I think this is not a small effort to do that, especially as in
> v4.0,
> >> much of ZK code will be removed, thus the diff between codebases will be
> >> large. In other words the additional costs of the backporting required
> >> with
> >> this alternative are likely to be higher than doing a 3.8 in my opinion.
> >>
> >> Quoting José again:
> >> > What are the disadvantages of adding the 3.8 release before 4.0? This
> >> would push the 4.0 release by 3-4 months. From what we can tell, it
> would
> >> also delay when KIP-896 can be implemented and extend how long the
> >> community needs to maintain the code used by ZK mode. Is there anything
> >> else?
> >>
> >> If we agree with previous points, I think the disadvantages will just
> >> disappear. The 3-4 months delay, the maintenance effort, KIP-896, and
> >> maybe
> >> you can also raise scala 2.12 and java 8 removal, which are not that
> >> critical compared with what I mentioned earlier that the worst case
> might
> >> be that the users lose their confidence to Apache Kafka.
> >>
> >>
> >> Quoting Colin:
> >> > I would not want to delay that because we want an additional feature.
> >> And
> >> we will always want additional features. So I am concerned we will end
> up
> >> in an infinite loop of people asking for "just one more feature" before
> >> they migrate.
> >>
> >> I totally agree with you. We can keep delaying the 4.0 release forever.
> >> I'd
> >> also like to draw a line to it. So, in my opinion, the 3.8 release is
> the
> >> line. No 3.9, 3.10 releases after that. If this is the decision, will
> your
> >> concern about this infinite loop disappear?
> >>
> >> Final note: Speaking of the missing features, I can always cooperate
> with
> >> you and all other community contributors to make them happen, like we
> have
> >> discussed earlier. Just let me know.
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Wed, Nov 22, 2023 at 2:54 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> >>
> >> > On Tue, Nov 21, 2023, at 03:47, Josep Prat wrote:
> >> > > Hi Colin,
> >> > >
> >> > > I think it's great that Confluent runs KRaft clusters in production,
> >> > > and it means that it is production ready for Confluent and it's
> users.
> >> > > But luckily for Kafka, the community is bigger than this (self
> managed
> >> > > in the cloud or in-prem, or customers of other SaaS companies).
> >> >
> >> > Hi Josep,
> >> >
> >> > Confluent is not the only company using or developing KRaft. Most of
> the
> >> > big organizations developing Kafka are involved. I mentioned
> Confluent's
> >> > deployments because I wanted to be clear that KRaft mode is not
> >> > experimental or new. Talking about software in production is a good
> way
> >> to
> >> > clear up these misconceptions.
> >> >
> >> > Indeed, KRaft mode is many years old. It started around 2020, and
> became
> >> > production-ready in AK 3.5 in 2022. ZK mode was deprecated in AK 3.5,
> >> which
> >> > was released June 2023. If we release AK 4.0 around April (or maybe a
> >> month
> >> > or two later) then that will be almost a full year between deprecation
> >> and
> >> > removal of ZK mode. We've talked about this a lot, in KIPs, in Apache
> >> blog
> >> > posts, at conferences, and so forth.
> >> >
> >> > > We've heard at least from 1 SaaS company, Aiven (disclaimer, it is
> my
> >> > > employer) where the current feature set makes it not trivial to
> >> > > migrate. This same issue might happen not only at Aiven but with any
> >> > > user of Kafka who uses immutable infrastructure.
> >> >
> >> > Can you discuss why you feel it is "not trivial to migrate"? From the
> >> > discussion above, the main gap is that we should improve the
> >> documentation
> >> > for handling failed disks.
> >> >
> >> > > Another case is for
> >> > > users that have hundreds (or more) of clusters and more than 100k
> >> nodes
> >> > > experience node failures multiple times during a single day. In this
> >> > > situation, not having KIP 853 makes these power users unable to join
> >> > > the game as  introducing a new error-prone manual (or needed to
> >> > > automate) operation is usually a huge no-go.
> >> >
> >> > We have thousands of KRaft clusters in production and haven't seen
> these
> >> > problems, as I described above.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> > >
> >> > > But I hear the concerns of delaying 4.0 for another 3 to 4 months.
> >> > > Would it help if we would aim at shortening the timeline for 3.8.0
> and
> >> > > start with the 4.0.0 a bit earlier help?
> >> > > Maybe we could work on 3.8.0 almost in parallel with 4.0.0:
> >> > > - Start with 3.8.0 release process
> >> > > - After a small time (let's say a week) create the release branch
> >> > > - Start with 4.0.0 release process as usual
> >> > > - Cherry pick KRaft related issues to 3.8.0
> >> > > - Release 3.8.0
> >> > > I suspect 4.0.0 will need a bit more time than usual to ensure the
> >> code
> >> > > is cleaned up of deprecated classes and methods on top of the usual
> >> > > work we have. For this reason I think there would be enough time
> >> > > between releasing 3.8.0 and 4.0.0.
> >> > >
> >> > > What do you all think?
> >> > >
> >> > > Best,
> >> > > Josep Prat
> >> > >
> >> > > On 2023/11/20 20:03:18 Colin McCabe wrote:
> >> > >> Hi Josep,
> >> > >>
> >> > >> I think there is some confusion here. Quorum reconfiguration is not
> >> > needed for KRaft to become production ready. Confluent runs thousands
> of
> >> > KRaft clusters without quorum reconfiguration, and has for years.
> While
> >> > dynamic quorum reconfiguration is a nice feature, it doesn't block
> >> > anything: not migration, not deployment. As best as I understand it,
> the
> >> > use-case Aiven has isn't even reconfiguration per se, just wiping a
> >> disk.
> >> > There are ways to handle this -- I discussed some earlier in the
> >> thread. I
> >> > think it would be productive to continue that discussion -- especially
> >> the
> >> > part around documentation and testing of these cases.
> >> > >>
> >> > >> A lot of people have done a lot of work to get Kafka 4.0 ready. I
> >> would
> >> > not want to delay that because we want an additional feature. And we
> >> will
> >> > always want additional features. So I am concerned we will end up in
> an
> >> > infinite loop of people asking for "just one more feature" before they
> >> > migrate.
> >> > >>
> >> > >> best,
> >> > >> Colin
> >> > >>
> >> > >>
> >> > >> On Mon, Nov 20, 2023, at 04:15, Josep Prat wrote:
> >> > >> > Hi all,
> >> > >> >
> >> > >> > I wanted to share my opinion regarding this topic. I know some
> >> > >> > discussions happened some time ago (over a year) but I believe
> it's
> >> > >> > wise to reflect and re-evaluate if those decisions are still
> valid.
> >> > >> > KRaft, as of Kafka 3.6.x and 3.7.x, has not yet feature parity
> with
> >> > >> > Zookeeper. By dropping Zookeeper altogether before achieving such
> >> > >> > parity, we are opening the door to leaving a chunk of Apache
> Kafka
> >> > >> > users without an easy way to upgrade to 4.0.
> >> > >> > In pro of making upgrades as smooth as possible, I propose to
> have
> >> a
> >> > >> > Kafka version where KIP-853 is merged and Zookeeper still is
> >> > supported.
> >> > >> > This will enable community members who can't migrate yet to KRaft
> >> to
> >> > do
> >> > >> > so in a safe way (rolling back is something goes wrong).
> >> > Additionally,
> >> > >> > this will give us more confidence on having KRaft replacing
> >> > >> > successfully Zookeeper without any big problems by discovering
> and
> >> > >> > fixing bugs or by confirming that KRaft works as expected.
> >> > >> > For this I strongly believe we should have a 3.8.x version before
> >> > 4.0.x.
> >> > >> >
> >> > >> > What do other think in this regard?
> >> > >> >
> >> > >> > Best,
> >> > >> >
> >> > >> > On 2023/11/14 20:47:10 Colin McCabe wrote:
> >> > >> >> On Tue, Nov 14, 2023, at 04:37, Anton Agestam wrote:
> >> > >> >> > Hi Colin,
> >> > >> >> >
> >> > >> >> > Thank you for your thoughtful and comprehensive response.
> >> > >> >> >
> >> > >> >> >> KIP-853 is not a blocker for either 3.7 or 4.0. We discussed
> >> this
> >> > in
> >> > >> >> >> several KIPs that happened this year and last year. The most
> >> > notable was
> >> > >> >> >> probably KIP-866, which was approved in May 2022.
> >> > >> >> >
> >> > >> >> > I understand this is the case, I'm raising my concern because
> I
> >> was
> >> > >> >> > foreseeing some major pain points as a consequence of this
> >> > decision. Just
> >> > >> >> > to make it clear though: I am not asking for anyone to do work
> >> for
> >> > me, and
> >> > >> >> > I understand the limitations of resources available to
> implement
> >> > features.
> >> > >> >> > What I was asking is rather to consider the implications of
> >> > _removing_
> >> > >> >> > features before there exists a replacement for them.
> >> > >> >> >
> >> > >> >> > I understand that the timeframe for 3.7 isn't feasible, and
> >> > because of that
> >> > >> >> > I think what I was asking is rather: can we make sure that
> there
> >> > are more
> >> > >> >> > 3.x releases until controller quorum online resizing is
> >> > implemented?
> >> > >> >> >
> >> > >> >> > From your response, I gather that your stance is that it's
> >> > important to
> >> > >> >> > drop ZK support sooner rather than later and that the
> necessary
> >> > pieces for
> >> > >> >> > doing so are already in place.
> >> > >> >>
> >> > >> >> Hi Anton,
> >> > >> >>
> >> > >> >> Yes. I'm basically just repeating what we agreed upon in 2022 as
> >> > part of KIP-833.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > ---
> >> > >> >> >
> >> > >> >> > I want to make sure I've understood your suggested sequence
> for
> >> > controller
> >> > >> >> > node replacement. I hope the mentions of Kubernetes are rather
> >> for
> >> > examples
> >> > >> >> > of how to carry things out, rather than saying "this is only
> >> > supported on
> >> > >> >> > Kubernetes"?
> >> > >> >>
> >> > >> >> Apache Kafka is supported in lots of environments, including
> >> non-k8s
> >> > ones. I was just pointing out that using k8s means that you control
> your
> >> > own DNS resolution, which simplifies matters. If you don't control DNS
> >> > there are some extra steps for changing the quorum voters.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > Given we have three existing nodes as such:
> >> > >> >> >
> >> > >> >> > - a.local -> 192.168.0.100
> >> > >> >> > - b.local -> 192.168.0.101
> >> > >> >> > - c.local -> 192.168.0.102
> >> > >> >> >
> >> > >> >> > As well as a candidate node 192.168.0.103 that we want to
> >> replace
> >> > for the
> >> > >> >> > role of c.local.
> >> > >> >> >
> >> > >> >> > 1. Shut down controller process on node .102 (to make sure we
> >> > don't "go
> >> > >> >> > back in time").
> >> > >> >> > 2. rsync state from leader to .103.
> >> > >> >> > 3. Start controller process on .103.
> >> > >> >> > 4. Point the c.local entry at .103.
> >> > >> >> >
> >> > >> >> > I have a few questions about this sequence:
> >> > >> >> >
> >> > >> >> > 1. Would this sequence be safe against leadership changes?
> >> > >> >> >
> >> > >> >>
> >> > >> >> If the leader changes, the new leader should have all of the
> >> > committed entries that the old leader had.
> >> > >> >>
> >> > >> >> > 2. Does it work
> >> > >> >>
> >> > >> >> Probably the biggest issue is dealing with "torn writes" that
> >> happen
> >> > because you're copying the current log segment while it's being
> written
> >> to.
> >> > The system should be robust against this. However, we don't regularly
> do
> >> > this, so there hasn't been a lot of testing.
> >> > >> >>
> >> > >> >> I think Jose had a PR for improving the handling of this which
> we
> >> > might want to dig up. We'd want the system to auto-truncate the
> partial
> >> > record at the end of the log, if there is one.
> >> > >> >>
> >> > >> >> > 3. By "state", do we mean `metadata.log.dir`? Something else?
> >> > >> >>
> >> > >> >> Yes, the state of the metadata.log.dir. Keep in mind you will
> need
> >> > to change the node ID in meta.properties after copying, of course.
> >> > >> >>
> >> > >> >> > 4. What are the effects on cluster availability? (I think this
> >> is
> >> > the same
> >> > >> >> > as asking what happens if a or b crashes during the process,
> or
> >> if
> >> > network
> >> > >> >> > partitions occur).
> >> > >> >>
> >> > >> >> Cluster metadata state tends to be pretty small. typically a
> >> hundred
> >> > megabytes or so. Therefore, I do not think it will take more than a
> >> second
> >> > or two to copy from one node to another. However, if you do
> experience a
> >> > crash when one node out of three is down, then you will be unavailable
> >> > until you can bring up a second node to regain a majority.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > ---
> >> > >> >> >
> >> > >> >> > If this is considered the official way of handling controller
> >> node
> >> > >> >> > replacements, does it make sense to improve documentation in
> >> this
> >> > area? Is
> >> > >> >> > there already a plan for this documentation layed out in some
> >> > KIPs? This is
> >> > >> >> > something I'd be happy to contribute to.
> >> > >> >> >
> >> > >> >>
> >> > >> >> Yes, I think we should have official documentation about this.
> >> We'd
> >> > be happy to review anything in that area.
> >> > >> >>
> >> > >> >> >> To circle back to KIP-853, I think it stands a good chance of
> >> > making it
> >> > >> >> >> into AK 4.0.
> >> > >> >> >
> >> > >> >> > This sounds good, but the point I was making was if we could
> >> have
> >> > a release
> >> > >> >> > with both KRaft and ZK supporting this feature to ease the
> >> > migration out of
> >> > >> >> > ZK.
> >> > >> >> >
> >> > >> >>
> >> > >> >> The problem is, supporting multiple controller implementations
> is
> >> a
> >> > huge burden. So we don't want to extend the 3.x release past the point
> >> > that's needed to complete all the must-dos (SCRAM, delegation tokens,
> >> JBOD)
> >> > >> >>
> >> > >> >> best,
> >> > >> >> Colin
> >> > >> >>
> >> > >> >>
> >> > >> >> > BR,
> >> > >> >> > Anton
> >> > >> >> >
> >> > >> >> > Den tors 9 nov. 2023 kl 23:04 skrev Colin McCabe <
> >> > cmcc...@apache.org>:
> >> > >> >> >
> >> > >> >> >> Hi Anton,
> >> > >> >> >>
> >> > >> >> >> It rarely makes sense to scale up and down the number of
> >> > controller nodes
> >> > >> >> >> in the cluster. Only one controller node will be active at
> any
> >> > given time.
> >> > >> >> >> The main reason to use 5 nodes would be to be able to
> tolerate
> >> 2
> >> > failures
> >> > >> >> >> instead of 1.
> >> > >> >> >>
> >> > >> >> >> At Confluent, we generally run KRaft with 3 controllers. We
> >> have
> >> > not seen
> >> > >> >> >> problems with this setup, even with thousands of clusters. We
> >> have
> >> > >> >> >> discussed using 5 node controller clusters on certain very
> big
> >> > clusters,
> >> > >> >> >> but we haven't done that yet. This is all very similar to ZK,
> >> > where most
> >> > >> >> >> deployments were 3 nodes as well.
> >> > >> >> >>
> >> > >> >> >> KIP-853 is not a blocker for either 3.7 or 4.0. We discussed
> >> this
> >> > in
> >> > >> >> >> several KIPs that happened this year and last year. The most
> >> > notable was
> >> > >> >> >> probably KIP-866, which was approved in May 2022.
> >> > >> >> >>
> >> > >> >> >> Many users these days run in a Kubernetes environment where
> >> > Kubernetes
> >> > >> >> >> actually controls the DNS. This makes changing the set of
> >> voters
> >> > less
> >> > >> >> >> important than it was historically.
> >> > >> >> >>
> >> > >> >> >> For example, in a world with static DNS, you might have to
> >> change
> >> > the
> >> > >> >> >> controller.quorum.voters setting from:
> >> > >> >> >>
> >> > >> >> >> 100@a.local:9073,101@b.local:9073,102@c.local:9073
> >> > >> >> >>
> >> > >> >> >> to:
> >> > >> >> >>
> >> > >> >> >> 100@a.local:9073,101@b.local:9073,102@d.local:9073
> >> > >> >> >>
> >> > >> >> >> In a world with k8s controlling the DNS, you simply remap
> >> c.local
> >> > to point
> >> > >> >> >> ot the IP address of your new pod for controller 102, and
> >> you're
> >> > done. No
> >> > >> >> >> need to update controller.quorum.voters.
> >> > >> >> >>
> >> > >> >> >> Another question is whether you re-create the pod data from
> >> > scratch every
> >> > >> >> >> time you add a new node. If you store the controller data on
> an
> >> > EBS volume
> >> > >> >> >> (or cloud-specific equivalent), you really only have to
> detach
> >> it
> >> > from the
> >> > >> >> >> previous pod and re-attach it to the new pod. k8s also
> handles
> >> > this
> >> > >> >> >> automatically, of course.
> >> > >> >> >>
> >> > >> >> >> If you want to reconstruct the full controller pod state each
> >> > time you
> >> > >> >> >> create a new pod (for example, so that you can use only
> >> instance
> >> > storage),
> >> > >> >> >> you should be able to rsync that state from the leader. In
> >> > general, the
> >> > >> >> >> invariant that we want to maintain is that the state should
> not
> >> > "go back in
> >> > >> >> >> time" -- if controller 102 promised to hold all log data up
> to
> >> > offset X, it
> >> > >> >> >> should come back with committed data at at least that offset.
> >> > >> >> >>
> >> > >> >> >> There are lots of new features we'd like to implement for
> >> KRaft,
> >> > and Kafka
> >> > >> >> >> in general. If you have some you really would like to see, I
> >> > think everyone
> >> > >> >> >> in the community would be happy to work with you. The flip
> >> side,
> >> > of course,
> >> > >> >> >> is that since there are an unlimited number of features we
> >> could
> >> > do, we
> >> > >> >> >> can't really block the release for any one feature.
> >> > >> >> >>
> >> > >> >> >> To circle back to KIP-853, I think it stands a good chance of
> >> > making it
> >> > >> >> >> into AK 4.0. Jose, Alyssa, and some other people have worked
> on
> >> > it. It
> >> > >> >> >> definitely won't make it into 3.7, since we have only a few
> >> weeks
> >> > left
> >> > >> >> >> before that release happens.
> >> > >> >> >>
> >> > >> >> >> best,
> >> > >> >> >> Colin
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> On Thu, Nov 9, 2023, at 00:20, Anton Agestam wrote:
> >> > >> >> >> > Hi Luke,
> >> > >> >> >> >
> >> > >> >> >> > We have been looking into what switching from ZK to KRaft
> >> will
> >> > mean for
> >> > >> >> >> > Aiven.
> >> > >> >> >> >
> >> > >> >> >> > We heavily depend on an “immutable infrastructure” model
> for
> >> > deployments.
> >> > >> >> >> > This means that, when we perform upgrades, we introduce new
> >> > nodes to our
> >> > >> >> >> > clusters, scale the cluster up to incorporate the new
> nodes,
> >> > and then
> >> > >> >> >> phase
> >> > >> >> >> > the old ones out once all partitions are moved to the new
> >> > generation.
> >> > >> >> >> This
> >> > >> >> >> > allows us, and anyone else using a similar model, to do
> >> > upgrades as well
> >> > >> >> >> as
> >> > >> >> >> > cluster resizing with zero downtime.
> >> > >> >> >> >
> >> > >> >> >> > Reading up on KRaft and the ZK-to-KRaft migration path,
> this
> >> is
> >> > somewhat
> >> > >> >> >> > worrying for us. It seems like, if KIP-853 is not included
> >> > prior to
> >> > >> >> >> > dropping support for ZK, we will essentially have no
> >> satisfying
> >> > upgrade
> >> > >> >> >> > path. Even if KIP-853 is included in 4.0, I’m unsure if
> that
> >> > would allow
> >> > >> >> >> a
> >> > >> >> >> > migration path for us, since a new cluster generation would
> >> not
> >> > be able
> >> > >> >> >> to
> >> > >> >> >> > use ZK during the migration step.
> >> > >> >> >> > On the other hand, if KIP-853 was released in a version
> prior
> >> > to dropping
> >> > >> >> >> > ZK support, because it allows online resizing of KRaft
> >> > clusters, this
> >> > >> >> >> would
> >> > >> >> >> > allow us and others that use an immutable infrastructure
> >> > deployment
> >> > >> >> >> model,
> >> > >> >> >> > to provide a zero downtime migration path.
> >> > >> >> >> >
> >> > >> >> >> > For that reason, we’d like to raise awareness around this
> >> issue
> >> > and
> >> > >> >> >> > encourage considering the implementation of KIP-853 or
> >> > equivalent a
> >> > >> >> >> blocker
> >> > >> >> >> > not only for 4.0, but for the last version prior to 4.0.
> >> > >> >> >> >
> >> > >> >> >> > BR,
> >> > >> >> >> > Anton
> >> > >> >> >> >
> >> > >> >> >> > On 2023/10/11 12:17:23 Luke Chen wrote:
> >> > >> >> >> >> Hi all,
> >> > >> >> >> >>
> >> > >> >> >> >> While Kafka 3.6.0 is released, I’d like to start the
> >> > discussion for the
> >> > >> >> >> >> “road to Kafka 4.0”. Based on the plan in KIP-833
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.7
> >> > >> >> >> >>,
> >> > >> >> >> >> the next release 3.7 will be the final release before
> moving
> >> > to Kafka
> >> > >> >> >> 4.0
> >> > >> >> >> >> to remove the Zookeeper from Kafka. Before making this
> major
> >> > change, I'd
> >> > >> >> >> >> like to get consensus on the "must-have features/fixes for
> >> > Kafka 4.0",
> >> > >> >> >> to
> >> > >> >> >> >> avoid some users being surprised when upgrading to Kafka
> >> 4.0.
> >> > The intent
> >> > >> >> >> > is
> >> > >> >> >> >> to have a clear communication about what to expect in the
> >> > following
> >> > >> >> >> > months.
> >> > >> >> >> >> In particular we should be signaling what features and
> >> > configurations
> >> > >> >> >> are
> >> > >> >> >> >> not supported, or at risk (if no one is able to add
> support
> >> or
> >> > fix known
> >> > >> >> >> >> bugs).
> >> > >> >> >> >>
> >> > >> >> >> >> Here is the JIRA tickets list
> >> > >> >> >> >> <
> >> > https://issues.apache.org/jira/issues/?jql=labels%20%3D%204.0-blocker
> >
> >> > >> >> >> I
> >> > >> >> >> >> labeled for "4.0-blocker". The criteria I labeled as
> >> > “4.0-blocker” are:
> >> > >> >> >> >> 1. The feature is supported in Zookeeper Mode, but not
> >> > supported in
> >> > >> >> >> KRaft
> >> > >> >> >> >> mode, yet (ex: KIP-858: JBOD in KRaft)
> >> > >> >> >> >> 2. Critical bugs in KRaft, (ex: KAFKA-15489 : split brain
> in
> >> > KRaft
> >> > >> >> >> >> controller quorum)
> >> > >> >> >> >>
> >> > >> >> >> >> If you disagree with my current list, welcome to have
> >> > discussion in the
> >> > >> >> >> >> specific JIRA ticket. Or, if you think there are some
> >> tickets
> >> > I missed,
> >> > >> >> >> >> welcome to start a discussion in the JIRA ticket and ping
> me
> >> > or other
> >> > >> >> >> >> people. After we get the consensus, we can label/unlabel
> it
> >> > afterwards.
> >> > >> >> >> >> Again, the goal is to have an open communication with the
> >> > community
> >> > >> >> >> about
> >> > >> >> >> >> what will be coming in 4.0.
> >> > >> >> >> >>
> >> > >> >> >> >> Below is the high level category of the list content:
> >> > >> >> >> >>
> >> > >> >> >> >> 1. Recovery from disk failure
> >> > >> >> >> >> KIP-856
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-856:+KRaft+Disk+Failure+Recovery
> >> > >> >> >> >>:
> >> > >> >> >> >> KRaft Disk Failure Recovery
> >> > >> >> >> >>
> >> > >> >> >> >> 2. Prevote to support controllers more than 3
> >> > >> >> >> >> KIP-650
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
> >> > >> >> >> >>:
> >> > >> >> >> >> Enhance Kafkaesque Raft semantics
> >> > >> >> >> >>
> >> > >> >> >> >> 3. JBOD support
> >> > >> >> >> >> KIP-858
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft
> >> > >> >> >> >>:
> >> > >> >> >> >> Handle
> >> > >> >> >> >> JBOD broker disk failure in KRaft
> >> > >> >> >> >>
> >> > >> >> >> >> 4. Scale up/down Controllers
> >> > >> >> >> >> KIP-853
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
> >> > >> >> >> >>:
> >> > >> >> >> >> KRaft Controller Membership Changes
> >> > >> >> >> >>
> >> > >> >> >> >> 5. Modifying dynamic configurations on the KRaft
> controller
> >> > >> >> >> >>
> >> > >> >> >> >> 6. Critical bugs in KRaft
> >> > >> >> >> >>
> >> > >> >> >> >> Does this make sense?
> >> > >> >> >> >> Any feedback is welcomed.
> >> > >> >> >> >>
> >> > >> >> >> >> Thank you.
> >> > >> >> >> >> Luke
> >> > >> >> >> >>
> >> > >> >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> >
>

Reply via email to