Re: [DISCUSS] Road to Kafka 4.0

David Jacot Thu, 21 Dec 2023 09:04:29 -0800

Thanks, Ismael. The proposal makes sense. +1

David


On Thu, Dec 21, 2023 at 5:59 PM Ismael Juma <[email protected]> wrote:

> Hi all,
>
> After understanding the use case Josep and Anton described in more detail,
> I think it's fair to say that quorum reconfiguration is necessary for
> migration of Apache Kafka users who follow this pattern. Given that, I
> think we should have a 3.8 release before the 4.0 release.
>
> The next question is whether we should do something special when it comes
> to timeline, parallel releases, etc. After careful consideration, I think
> we should simply follow our usual approach: regular 3.8 release around
> early May 2024 and regular 4.0 release around early September 2024. The
> community will be able to start working on items specific to 4.0 after 3.8
> is branched in late March/early April - I don't think we need to deal with
> the overhead of maintaining multiple long-lived branches for
> feature development.
>
> If the proposal above sounds reasonable, I suggest we write a KIP and vote
> on it. Any volunteers?
>
> Ismael
>
> On Tue, Nov 21, 2023 at 8:18 PM Ismael Juma <[email protected]> wrote:
>
> > Hi Luke,
> >
> > I think we're conflating different things here. There are 3 separate
> > points in your email, but only 1 of them requires 3.8:
> >
> > 1. JBOD may have some bugs in 3.7.0. Whatever bugs exist can be fixed in
> > 3.7.x. We have already said that we will backport critical fixes to 3.7.x
> > for some time.
> > 2. Quorum reconfiguration is important to include in 4.0, the release
> > where ZK won't be supported. This doesn't need a 3.8 release either.
> > 3. Quorum reconfiguration is necessary for migration use cases and hence
> > needs to be in a 3.x release. This one would require a 3.8 release if
> true.
> > But we should have a debate on whether it is indeed true. It's not clear
> to
> > me yet.
> >
> > Ismael
> >
> > On Tue, Nov 21, 2023 at 7:30 PM Luke Chen <[email protected]> wrote:
> >
> >> Hi Colin and Jose,
> >>
> >> I revisited the discussion of KIP-833 here
> >> <https://lists.apache.org/thread/90zkqvmmw3y8j6tkgbg3md78m7hs4yn6>, and
> >> you
> >> can see I'm the first one to reply to the discussion thread to express
> my
> >> excitement at that time. Till now, I personally still think having KRaft
> >> in
> >> Kafka is a good direction we have to move forward. But to move to this
> >> destination, we need to make our users comfortable with this decision.
> The
> >> worst scenario is, we said 4.0 is ready, and ZK is removed. Then, some
> >> users move to 4.0 and say, wait a minute, why does it not support xxx
> >> feature? And then start to search for other alternatives to replace
> Apache
> >> Kafka. We all don't want to see this, right? So, that's why some
> community
> >> users start to express their concern to move to 4.0 too quickly,
> including
> >> me.
> >>
> >>
> >> Quoting Colin:
> >> > While dynamic quorum reconfiguration is a nice feature, it doesn't
> block
> >> anything: not migration, not deployment.
> >>
> >> Clearly Confluent team might deploy ZooKeeper in a particular way and
> >> didn’t depend on its ability to support reconfiguration. So KRaft is
> ready
> >> from your point of view. But users of Apache Kafka might have come to
> >> depend on some ZooKeeper functionality, such as the ability to
> reconfigure
> >> ZooKeeper quorums, that is not available in KRaft, yet. I don’t think
> the
> >> Apache Kafka documentation has ever said “do not depend on this ability
> of
> >> Apache Kafka or Zookeeper”, so it doesn’t seem unreasonable for users to
> >> have deployed ZooKeeper in this way. In KIP-833
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-MissingFeatures
> >> >,
> >> we said: “Modifying certain dynamic configurations on the standalone
> KRaft
> >> controller” was an important missing feature. Unfortunately it wasn’t as
> >> explicit as it could have been. While no one expects KRaft to support
> all
> >> the features of ZooKeeper, it looks to me that users might depend on
> this
> >> particular feature and it’s only recently that it’s become apparent that
> >> you don’t consider it a blocker.
> >>
> >> Quoting José:
> >> > If we do a 3.8 release before 4.0 and we implement KIP-853 in 3.8, the
> >> user will be able to migrate to a KRaft cluster that supports
> dynamically
> >> changing the set of voters and has better support for disk failures.
> >>
> >> Yes, KIP-853 and disk failure support are both very important missing
> >> features. For the disk failure support, I don't think this is a
> >> "good-to-have-feature", it should be a "must-have" IMO. We can't
> announce
> >> the 4.0 release without a good solution for disk failure in KRaft.
> >>
> >> It’s also worth thinking about how Apache Kafka users who depend on JBOD
> >> might look at the risks of not having a 3.8 release. JBOD support on
> KRaft
> >> is planned to be added in 3.7, and is still in progress so far. So it’s
> >> hard to say it’s a blocker or not. But in practice, even if the feature
> is
> >> made into 3.7 in time, a lot of new code for this feature is unlikely to
> >> be
> >> entirely bug free. We need to maintain the confidence of those users,
> and
> >> forcing them to migrate through 3.7 where this new code is hardly
> >> battle-tested doesn’t appear to do that.
> >>
> >> Our goal for 4.0 should be that all the “main” features in KRaft are in
> >> production ready state. To reach the goal, I think having one more
> release
> >> makes sense. We can have different opinions about what the “main
> features”
> >> in KRaft are, but we should all agree, JBOD is one of them.
> >>
> >> Alternatively, like Josep proposed, we can choose to have 4.0 + 3.7.x or
> >> 3.8 releases in parallel to maintain these 2 releases for a defined
> >> period.
> >> But I think this is not a small effort to do that, especially as in
> v4.0,
> >> much of ZK code will be removed, thus the diff between codebases will be
> >> large. In other words the additional costs of the backporting required
> >> with
> >> this alternative are likely to be higher than doing a 3.8 in my opinion.
> >>
> >> Quoting José again:
> >> > What are the disadvantages of adding the 3.8 release before 4.0? This
> >> would push the 4.0 release by 3-4 months. From what we can tell, it
> would
> >> also delay when KIP-896 can be implemented and extend how long the
> >> community needs to maintain the code used by ZK mode. Is there anything
> >> else?
> >>
> >> If we agree with previous points, I think the disadvantages will just
> >> disappear. The 3-4 months delay, the maintenance effort, KIP-896, and
> >> maybe
> >> you can also raise scala 2.12 and java 8 removal, which are not that
> >> critical compared with what I mentioned earlier that the worst case
> might
> >> be that the users lose their confidence to Apache Kafka.
> >>
> >>
> >> Quoting Colin:
> >> > I would not want to delay that because we want an additional feature.
> >> And
> >> we will always want additional features. So I am concerned we will end
> up
> >> in an infinite loop of people asking for "just one more feature" before
> >> they migrate.
> >>
> >> I totally agree with you. We can keep delaying the 4.0 release forever.
> >> I'd
> >> also like to draw a line to it. So, in my opinion, the 3.8 release is
> the
> >> line. No 3.9, 3.10 releases after that. If this is the decision, will
> your
> >> concern about this infinite loop disappear?
> >>
> >> Final note: Speaking of the missing features, I can always cooperate
> with
> >> you and all other community contributors to make them happen, like we
> have
> >> discussed earlier. Just let me know.
> >>
> >> Thank you.
> >> Luke
> >>
> >> On Wed, Nov 22, 2023 at 2:54 AM Colin McCabe <[email protected]>
> wrote:
> >>
> >> > On Tue, Nov 21, 2023, at 03:47, Josep Prat wrote:
> >> > > Hi Colin,
> >> > >
> >> > > I think it's great that Confluent runs KRaft clusters in production,
> >> > > and it means that it is production ready for Confluent and it's
> users.
> >> > > But luckily for Kafka, the community is bigger than this (self
> managed
> >> > > in the cloud or in-prem, or customers of other SaaS companies).
> >> >
> >> > Hi Josep,
> >> >
> >> > Confluent is not the only company using or developing KRaft. Most of
> the
> >> > big organizations developing Kafka are involved. I mentioned
> Confluent's
> >> > deployments because I wanted to be clear that KRaft mode is not
> >> > experimental or new. Talking about software in production is a good
> way
> >> to
> >> > clear up these misconceptions.
> >> >
> >> > Indeed, KRaft mode is many years old. It started around 2020, and
> became
> >> > production-ready in AK 3.5 in 2022. ZK mode was deprecated in AK 3.5,
> >> which
> >> > was released June 2023. If we release AK 4.0 around April (or maybe a
> >> month
> >> > or two later) then that will be almost a full year between deprecation
> >> and
> >> > removal of ZK mode. We've talked about this a lot, in KIPs, in Apache
> >> blog
> >> > posts, at conferences, and so forth.
> >> >
> >> > > We've heard at least from 1 SaaS company, Aiven (disclaimer, it is
> my
> >> > > employer) where the current feature set makes it not trivial to
> >> > > migrate. This same issue might happen not only at Aiven but with any
> >> > > user of Kafka who uses immutable infrastructure.
> >> >
> >> > Can you discuss why you feel it is "not trivial to migrate"? From the
> >> > discussion above, the main gap is that we should improve the
> >> documentation
> >> > for handling failed disks.
> >> >
> >> > > Another case is for
> >> > > users that have hundreds (or more) of clusters and more than 100k
> >> nodes
> >> > > experience node failures multiple times during a single day. In this
> >> > > situation, not having KIP 853 makes these power users unable to join
> >> > > the game as  introducing a new error-prone manual (or needed to
> >> > > automate) operation is usually a huge no-go.
> >> >
> >> > We have thousands of KRaft clusters in production and haven't seen
> these
> >> > problems, as I described above.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> > >
> >> > > But I hear the concerns of delaying 4.0 for another 3 to 4 months.
> >> > > Would it help if we would aim at shortening the timeline for 3.8.0
> and
> >> > > start with the 4.0.0 a bit earlier help?
> >> > > Maybe we could work on 3.8.0 almost in parallel with 4.0.0:
> >> > > - Start with 3.8.0 release process
> >> > > - After a small time (let's say a week) create the release branch
> >> > > - Start with 4.0.0 release process as usual
> >> > > - Cherry pick KRaft related issues to 3.8.0
> >> > > - Release 3.8.0
> >> > > I suspect 4.0.0 will need a bit more time than usual to ensure the
> >> code
> >> > > is cleaned up of deprecated classes and methods on top of the usual
> >> > > work we have. For this reason I think there would be enough time
> >> > > between releasing 3.8.0 and 4.0.0.
> >> > >
> >> > > What do you all think?
> >> > >
> >> > > Best,
> >> > > Josep Prat
> >> > >
> >> > > On 2023/11/20 20:03:18 Colin McCabe wrote:
> >> > >> Hi Josep,
> >> > >>
> >> > >> I think there is some confusion here. Quorum reconfiguration is not
> >> > needed for KRaft to become production ready. Confluent runs thousands
> of
> >> > KRaft clusters without quorum reconfiguration, and has for years.
> While
> >> > dynamic quorum reconfiguration is a nice feature, it doesn't block
> >> > anything: not migration, not deployment. As best as I understand it,
> the
> >> > use-case Aiven has isn't even reconfiguration per se, just wiping a
> >> disk.
> >> > There are ways to handle this -- I discussed some earlier in the
> >> thread. I
> >> > think it would be productive to continue that discussion -- especially
> >> the
> >> > part around documentation and testing of these cases.
> >> > >>
> >> > >> A lot of people have done a lot of work to get Kafka 4.0 ready. I
> >> would
> >> > not want to delay that because we want an additional feature. And we
> >> will
> >> > always want additional features. So I am concerned we will end up in
> an
> >> > infinite loop of people asking for "just one more feature" before they
> >> > migrate.
> >> > >>
> >> > >> best,
> >> > >> Colin
> >> > >>
> >> > >>
> >> > >> On Mon, Nov 20, 2023, at 04:15, Josep Prat wrote:
> >> > >> > Hi all,
> >> > >> >
> >> > >> > I wanted to share my opinion regarding this topic. I know some
> >> > >> > discussions happened some time ago (over a year) but I believe
> it's
> >> > >> > wise to reflect and re-evaluate if those decisions are still
> valid.
> >> > >> > KRaft, as of Kafka 3.6.x and 3.7.x, has not yet feature parity
> with
> >> > >> > Zookeeper. By dropping Zookeeper altogether before achieving such
> >> > >> > parity, we are opening the door to leaving a chunk of Apache
> Kafka
> >> > >> > users without an easy way to upgrade to 4.0.
> >> > >> > In pro of making upgrades as smooth as possible, I propose to
> have
> >> a
> >> > >> > Kafka version where KIP-853 is merged and Zookeeper still is
> >> > supported.
> >> > >> > This will enable community members who can't migrate yet to KRaft
> >> to
> >> > do
> >> > >> > so in a safe way (rolling back is something goes wrong).
> >> > Additionally,
> >> > >> > this will give us more confidence on having KRaft replacing
> >> > >> > successfully Zookeeper without any big problems by discovering
> and
> >> > >> > fixing bugs or by confirming that KRaft works as expected.
> >> > >> > For this I strongly believe we should have a 3.8.x version before
> >> > 4.0.x.
> >> > >> >
> >> > >> > What do other think in this regard?
> >> > >> >
> >> > >> > Best,
> >> > >> >
> >> > >> > On 2023/11/14 20:47:10 Colin McCabe wrote:
> >> > >> >> On Tue, Nov 14, 2023, at 04:37, Anton Agestam wrote:
> >> > >> >> > Hi Colin,
> >> > >> >> >
> >> > >> >> > Thank you for your thoughtful and comprehensive response.
> >> > >> >> >
> >> > >> >> >> KIP-853 is not a blocker for either 3.7 or 4.0. We discussed
> >> this
> >> > in
> >> > >> >> >> several KIPs that happened this year and last year. The most
> >> > notable was
> >> > >> >> >> probably KIP-866, which was approved in May 2022.
> >> > >> >> >
> >> > >> >> > I understand this is the case, I'm raising my concern because
> I
> >> was
> >> > >> >> > foreseeing some major pain points as a consequence of this
> >> > decision. Just
> >> > >> >> > to make it clear though: I am not asking for anyone to do work
> >> for
> >> > me, and
> >> > >> >> > I understand the limitations of resources available to
> implement
> >> > features.
> >> > >> >> > What I was asking is rather to consider the implications of
> >> > _removing_
> >> > >> >> > features before there exists a replacement for them.
> >> > >> >> >
> >> > >> >> > I understand that the timeframe for 3.7 isn't feasible, and
> >> > because of that
> >> > >> >> > I think what I was asking is rather: can we make sure that
> there
> >> > are more
> >> > >> >> > 3.x releases until controller quorum online resizing is
> >> > implemented?
> >> > >> >> >
> >> > >> >> > From your response, I gather that your stance is that it's
> >> > important to
> >> > >> >> > drop ZK support sooner rather than later and that the
> necessary
> >> > pieces for
> >> > >> >> > doing so are already in place.
> >> > >> >>
> >> > >> >> Hi Anton,
> >> > >> >>
> >> > >> >> Yes. I'm basically just repeating what we agreed upon in 2022 as
> >> > part of KIP-833.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > ---
> >> > >> >> >
> >> > >> >> > I want to make sure I've understood your suggested sequence
> for
> >> > controller
> >> > >> >> > node replacement. I hope the mentions of Kubernetes are rather
> >> for
> >> > examples
> >> > >> >> > of how to carry things out, rather than saying "this is only
> >> > supported on
> >> > >> >> > Kubernetes"?
> >> > >> >>
> >> > >> >> Apache Kafka is supported in lots of environments, including
> >> non-k8s
> >> > ones. I was just pointing out that using k8s means that you control
> your
> >> > own DNS resolution, which simplifies matters. If you don't control DNS
> >> > there are some extra steps for changing the quorum voters.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > Given we have three existing nodes as such:
> >> > >> >> >
> >> > >> >> > - a.local -> 192.168.0.100
> >> > >> >> > - b.local -> 192.168.0.101
> >> > >> >> > - c.local -> 192.168.0.102
> >> > >> >> >
> >> > >> >> > As well as a candidate node 192.168.0.103 that we want to
> >> replace
> >> > for the
> >> > >> >> > role of c.local.
> >> > >> >> >
> >> > >> >> > 1. Shut down controller process on node .102 (to make sure we
> >> > don't "go
> >> > >> >> > back in time").
> >> > >> >> > 2. rsync state from leader to .103.
> >> > >> >> > 3. Start controller process on .103.
> >> > >> >> > 4. Point the c.local entry at .103.
> >> > >> >> >
> >> > >> >> > I have a few questions about this sequence:
> >> > >> >> >
> >> > >> >> > 1. Would this sequence be safe against leadership changes?
> >> > >> >> >
> >> > >> >>
> >> > >> >> If the leader changes, the new leader should have all of the
> >> > committed entries that the old leader had.
> >> > >> >>
> >> > >> >> > 2. Does it work
> >> > >> >>
> >> > >> >> Probably the biggest issue is dealing with "torn writes" that
> >> happen
> >> > because you're copying the current log segment while it's being
> written
> >> to.
> >> > The system should be robust against this. However, we don't regularly
> do
> >> > this, so there hasn't been a lot of testing.
> >> > >> >>
> >> > >> >> I think Jose had a PR for improving the handling of this which
> we
> >> > might want to dig up. We'd want the system to auto-truncate the
> partial
> >> > record at the end of the log, if there is one.
> >> > >> >>
> >> > >> >> > 3. By "state", do we mean `metadata.log.dir`? Something else?
> >> > >> >>
> >> > >> >> Yes, the state of the metadata.log.dir. Keep in mind you will
> need
> >> > to change the node ID in meta.properties after copying, of course.
> >> > >> >>
> >> > >> >> > 4. What are the effects on cluster availability? (I think this
> >> is
> >> > the same
> >> > >> >> > as asking what happens if a or b crashes during the process,
> or
> >> if
> >> > network
> >> > >> >> > partitions occur).
> >> > >> >>
> >> > >> >> Cluster metadata state tends to be pretty small. typically a
> >> hundred
> >> > megabytes or so. Therefore, I do not think it will take more than a
> >> second
> >> > or two to copy from one node to another. However, if you do
> experience a
> >> > crash when one node out of three is down, then you will be unavailable
> >> > until you can bring up a second node to regain a majority.
> >> > >> >>
> >> > >> >> >
> >> > >> >> > ---
> >> > >> >> >
> >> > >> >> > If this is considered the official way of handling controller
> >> node
> >> > >> >> > replacements, does it make sense to improve documentation in
> >> this
> >> > area? Is
> >> > >> >> > there already a plan for this documentation layed out in some
> >> > KIPs? This is
> >> > >> >> > something I'd be happy to contribute to.
> >> > >> >> >
> >> > >> >>
> >> > >> >> Yes, I think we should have official documentation about this.
> >> We'd
> >> > be happy to review anything in that area.
> >> > >> >>
> >> > >> >> >> To circle back to KIP-853, I think it stands a good chance of
> >> > making it
> >> > >> >> >> into AK 4.0.
> >> > >> >> >
> >> > >> >> > This sounds good, but the point I was making was if we could
> >> have
> >> > a release
> >> > >> >> > with both KRaft and ZK supporting this feature to ease the
> >> > migration out of
> >> > >> >> > ZK.
> >> > >> >> >
> >> > >> >>
> >> > >> >> The problem is, supporting multiple controller implementations
> is
> >> a
> >> > huge burden. So we don't want to extend the 3.x release past the point
> >> > that's needed to complete all the must-dos (SCRAM, delegation tokens,
> >> JBOD)
> >> > >> >>
> >> > >> >> best,
> >> > >> >> Colin
> >> > >> >>
> >> > >> >>
> >> > >> >> > BR,
> >> > >> >> > Anton
> >> > >> >> >
> >> > >> >> > Den tors 9 nov. 2023 kl 23:04 skrev Colin McCabe <
> >> > [email protected]>:
> >> > >> >> >
> >> > >> >> >> Hi Anton,
> >> > >> >> >>
> >> > >> >> >> It rarely makes sense to scale up and down the number of
> >> > controller nodes
> >> > >> >> >> in the cluster. Only one controller node will be active at
> any
> >> > given time.
> >> > >> >> >> The main reason to use 5 nodes would be to be able to
> tolerate
> >> 2
> >> > failures
> >> > >> >> >> instead of 1.
> >> > >> >> >>
> >> > >> >> >> At Confluent, we generally run KRaft with 3 controllers. We
> >> have
> >> > not seen
> >> > >> >> >> problems with this setup, even with thousands of clusters. We
> >> have
> >> > >> >> >> discussed using 5 node controller clusters on certain very
> big
> >> > clusters,
> >> > >> >> >> but we haven't done that yet. This is all very similar to ZK,
> >> > where most
> >> > >> >> >> deployments were 3 nodes as well.
> >> > >> >> >>
> >> > >> >> >> KIP-853 is not a blocker for either 3.7 or 4.0. We discussed
> >> this
> >> > in
> >> > >> >> >> several KIPs that happened this year and last year. The most
> >> > notable was
> >> > >> >> >> probably KIP-866, which was approved in May 2022.
> >> > >> >> >>
> >> > >> >> >> Many users these days run in a Kubernetes environment where
> >> > Kubernetes
> >> > >> >> >> actually controls the DNS. This makes changing the set of
> >> voters
> >> > less
> >> > >> >> >> important than it was historically.
> >> > >> >> >>
> >> > >> >> >> For example, in a world with static DNS, you might have to
> >> change
> >> > the
> >> > >> >> >> controller.quorum.voters setting from:
> >> > >> >> >>
> >> > >> >> >> [email protected]:9073,[email protected]:9073,[email protected]:9073
> >> > >> >> >>
> >> > >> >> >> to:
> >> > >> >> >>
> >> > >> >> >> [email protected]:9073,[email protected]:9073,[email protected]:9073
> >> > >> >> >>
> >> > >> >> >> In a world with k8s controlling the DNS, you simply remap
> >> c.local
> >> > to point
> >> > >> >> >> ot the IP address of your new pod for controller 102, and
> >> you're
> >> > done. No
> >> > >> >> >> need to update controller.quorum.voters.
> >> > >> >> >>
> >> > >> >> >> Another question is whether you re-create the pod data from
> >> > scratch every
> >> > >> >> >> time you add a new node. If you store the controller data on
> an
> >> > EBS volume
> >> > >> >> >> (or cloud-specific equivalent), you really only have to
> detach
> >> it
> >> > from the
> >> > >> >> >> previous pod and re-attach it to the new pod. k8s also
> handles
> >> > this
> >> > >> >> >> automatically, of course.
> >> > >> >> >>
> >> > >> >> >> If you want to reconstruct the full controller pod state each
> >> > time you
> >> > >> >> >> create a new pod (for example, so that you can use only
> >> instance
> >> > storage),
> >> > >> >> >> you should be able to rsync that state from the leader. In
> >> > general, the
> >> > >> >> >> invariant that we want to maintain is that the state should
> not
> >> > "go back in
> >> > >> >> >> time" -- if controller 102 promised to hold all log data up
> to
> >> > offset X, it
> >> > >> >> >> should come back with committed data at at least that offset.
> >> > >> >> >>
> >> > >> >> >> There are lots of new features we'd like to implement for
> >> KRaft,
> >> > and Kafka
> >> > >> >> >> in general. If you have some you really would like to see, I
> >> > think everyone
> >> > >> >> >> in the community would be happy to work with you. The flip
> >> side,
> >> > of course,
> >> > >> >> >> is that since there are an unlimited number of features we
> >> could
> >> > do, we
> >> > >> >> >> can't really block the release for any one feature.
> >> > >> >> >>
> >> > >> >> >> To circle back to KIP-853, I think it stands a good chance of
> >> > making it
> >> > >> >> >> into AK 4.0. Jose, Alyssa, and some other people have worked
> on
> >> > it. It
> >> > >> >> >> definitely won't make it into 3.7, since we have only a few
> >> weeks
> >> > left
> >> > >> >> >> before that release happens.
> >> > >> >> >>
> >> > >> >> >> best,
> >> > >> >> >> Colin
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> On Thu, Nov 9, 2023, at 00:20, Anton Agestam wrote:
> >> > >> >> >> > Hi Luke,
> >> > >> >> >> >
> >> > >> >> >> > We have been looking into what switching from ZK to KRaft
> >> will
> >> > mean for
> >> > >> >> >> > Aiven.
> >> > >> >> >> >
> >> > >> >> >> > We heavily depend on an “immutable infrastructure” model
> for
> >> > deployments.
> >> > >> >> >> > This means that, when we perform upgrades, we introduce new
> >> > nodes to our
> >> > >> >> >> > clusters, scale the cluster up to incorporate the new
> nodes,
> >> > and then
> >> > >> >> >> phase
> >> > >> >> >> > the old ones out once all partitions are moved to the new
> >> > generation.
> >> > >> >> >> This
> >> > >> >> >> > allows us, and anyone else using a similar model, to do
> >> > upgrades as well
> >> > >> >> >> as
> >> > >> >> >> > cluster resizing with zero downtime.
> >> > >> >> >> >
> >> > >> >> >> > Reading up on KRaft and the ZK-to-KRaft migration path,
> this
> >> is
> >> > somewhat
> >> > >> >> >> > worrying for us. It seems like, if KIP-853 is not included
> >> > prior to
> >> > >> >> >> > dropping support for ZK, we will essentially have no
> >> satisfying
> >> > upgrade
> >> > >> >> >> > path. Even if KIP-853 is included in 4.0, I’m unsure if
> that
> >> > would allow
> >> > >> >> >> a
> >> > >> >> >> > migration path for us, since a new cluster generation would
> >> not
> >> > be able
> >> > >> >> >> to
> >> > >> >> >> > use ZK during the migration step.
> >> > >> >> >> > On the other hand, if KIP-853 was released in a version
> prior
> >> > to dropping
> >> > >> >> >> > ZK support, because it allows online resizing of KRaft
> >> > clusters, this
> >> > >> >> >> would
> >> > >> >> >> > allow us and others that use an immutable infrastructure
> >> > deployment
> >> > >> >> >> model,
> >> > >> >> >> > to provide a zero downtime migration path.
> >> > >> >> >> >
> >> > >> >> >> > For that reason, we’d like to raise awareness around this
> >> issue
> >> > and
> >> > >> >> >> > encourage considering the implementation of KIP-853 or
> >> > equivalent a
> >> > >> >> >> blocker
> >> > >> >> >> > not only for 4.0, but for the last version prior to 4.0.
> >> > >> >> >> >
> >> > >> >> >> > BR,
> >> > >> >> >> > Anton
> >> > >> >> >> >
> >> > >> >> >> > On 2023/10/11 12:17:23 Luke Chen wrote:
> >> > >> >> >> >> Hi all,
> >> > >> >> >> >>
> >> > >> >> >> >> While Kafka 3.6.0 is released, I’d like to start the
> >> > discussion for the
> >> > >> >> >> >> “road to Kafka 4.0”. Based on the plan in KIP-833
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-Kafka3.7
> >> > >> >> >> >>,
> >> > >> >> >> >> the next release 3.7 will be the final release before
> moving
> >> > to Kafka
> >> > >> >> >> 4.0
> >> > >> >> >> >> to remove the Zookeeper from Kafka. Before making this
> major
> >> > change, I'd
> >> > >> >> >> >> like to get consensus on the "must-have features/fixes for
> >> > Kafka 4.0",
> >> > >> >> >> to
> >> > >> >> >> >> avoid some users being surprised when upgrading to Kafka
> >> 4.0.
> >> > The intent
> >> > >> >> >> > is
> >> > >> >> >> >> to have a clear communication about what to expect in the
> >> > following
> >> > >> >> >> > months.
> >> > >> >> >> >> In particular we should be signaling what features and
> >> > configurations
> >> > >> >> >> are
> >> > >> >> >> >> not supported, or at risk (if no one is able to add
> support
> >> or
> >> > fix known
> >> > >> >> >> >> bugs).
> >> > >> >> >> >>
> >> > >> >> >> >> Here is the JIRA tickets list
> >> > >> >> >> >> <
> >> > https://issues.apache.org/jira/issues/?jql=labels%20%3D%204.0-blocker
> >
> >> > >> >> >> I
> >> > >> >> >> >> labeled for "4.0-blocker". The criteria I labeled as
> >> > “4.0-blocker” are:
> >> > >> >> >> >> 1. The feature is supported in Zookeeper Mode, but not
> >> > supported in
> >> > >> >> >> KRaft
> >> > >> >> >> >> mode, yet (ex: KIP-858: JBOD in KRaft)
> >> > >> >> >> >> 2. Critical bugs in KRaft, (ex: KAFKA-15489 : split brain
> in
> >> > KRaft
> >> > >> >> >> >> controller quorum)
> >> > >> >> >> >>
> >> > >> >> >> >> If you disagree with my current list, welcome to have
> >> > discussion in the
> >> > >> >> >> >> specific JIRA ticket. Or, if you think there are some
> >> tickets
> >> > I missed,
> >> > >> >> >> >> welcome to start a discussion in the JIRA ticket and ping
> me
> >> > or other
> >> > >> >> >> >> people. After we get the consensus, we can label/unlabel
> it
> >> > afterwards.
> >> > >> >> >> >> Again, the goal is to have an open communication with the
> >> > community
> >> > >> >> >> about
> >> > >> >> >> >> what will be coming in 4.0.
> >> > >> >> >> >>
> >> > >> >> >> >> Below is the high level category of the list content:
> >> > >> >> >> >>
> >> > >> >> >> >> 1. Recovery from disk failure
> >> > >> >> >> >> KIP-856
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-856:+KRaft+Disk+Failure+Recovery
> >> > >> >> >> >>:
> >> > >> >> >> >> KRaft Disk Failure Recovery
> >> > >> >> >> >>
> >> > >> >> >> >> 2. Prevote to support controllers more than 3
> >> > >> >> >> >> KIP-650
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-650%3A+Enhance+Kafkaesque+Raft+semantics
> >> > >> >> >> >>:
> >> > >> >> >> >> Enhance Kafkaesque Raft semantics
> >> > >> >> >> >>
> >> > >> >> >> >> 3. JBOD support
> >> > >> >> >> >> KIP-858
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft
> >> > >> >> >> >>:
> >> > >> >> >> >> Handle
> >> > >> >> >> >> JBOD broker disk failure in KRaft
> >> > >> >> >> >>
> >> > >> >> >> >> 4. Scale up/down Controllers
> >> > >> >> >> >> KIP-853
> >> > >> >> >> >> <
> >> > >> >> >> >
> >> > >> >> >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes
> >> > >> >> >> >>:
> >> > >> >> >> >> KRaft Controller Membership Changes
> >> > >> >> >> >>
> >> > >> >> >> >> 5. Modifying dynamic configurations on the KRaft
> controller
> >> > >> >> >> >>
> >> > >> >> >> >> 6. Critical bugs in KRaft
> >> > >> >> >> >>
> >> > >> >> >> >> Does this make sense?
> >> > >> >> >> >> Any feedback is welcomed.
> >> > >> >> >> >>
> >> > >> >> >> >> Thank you.
> >> > >> >> >> >> Luke
> >> > >> >> >> >>
> >> > >> >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> >
>

Re: [DISCUSS] Road to Kafka 4.0

Reply via email to