Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

Jun Rao Fri, 28 Oct 2022 11:25:44 -0700

Hi, David,

Thanks for the reply.


20/21. Sounds good.

Could you update the doc with all the changes being discussed?

Thanks,

Jun

On Fri, Oct 28, 2022 at 10:11 AM David Arthur
<david.art...@confluent.io.invalid> wrote:

> Jun,
>
> 20/21. I was also wondering about a "migration" record. In addition to the
> scenario you mentioned, we also need a way to prevent the cluster from
> re-entering the dual write mode after the migration has been finalized. I
> could see this happening inadvertently via a change in some configuration
> management system. How about we add a record that marks the beginning and
> end of the dual-write mode. The first occurrence of the record could be
> included in the metadata transaction when we migrate data from ZK.
>
> With this, the active controller would decide whether to enter dual write
> mode, finalize the migration based, or fail based on:
>
> * Metadata log state
> * It's own configuration ("kafka.metadata.migration.enable",
> "zookeeper.connect", etc)
> * The other controllers configuration (via ApiVersionsResponse)
>
> WDYT?
>
> 22. Since we will need the fencing anyways as a safe-guard, then I agree
> would could skip the registration of KRaft brokers in ZK to simply things a
> bit.
>
> Thanks,
> David
>
>
>
> On Thu, Oct 27, 2022 at 5:11 PM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 20/21. Relying upon the presence of ZK configs to determine whether the
> > KRaft controller is in a dual write mode seems a bit error prone. If
> > someone accidentally adds a ZK configuration to a brand new KRaft
> cluster,
> > ideally it shouldn't cause the controller to get into a weird state. Have
> > we considered storing the migration state in a metadata record?
> >
> > 22. If we have the broker fencing logic, do we need to write the broker
> > registration path in ZK for KRaft brokers at all?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> > <david.art...@confluent.io.invalid> wrote:
> >
> > > Jun,
> > >
> > > 20/21. A KRaft controller will recover the migration state by reading
> the
> > > "/migration" ZNode. If the migration enable config is set, and the ZK
> > > migration is complete, it will enter the dual-write mode. Before an
> > > operator can decommission ZK, they will need to finalize the migration
> > > which involves removing the migration config and the ZK config. I'll
> > > clarify this in the KIP.
> > >
> > > 22. Yea, we could see an incorrect broker ID during that window.  If we
> > > ended up with a state where we saw a ZK broker ID that conflicted with
> a
> > > KRaft broker ID, we would need to fence one of them. I would probably
> opt
> > > to fence the KRaft broker in that case since broker registration and
> > > fencing is more robust in KRaft. Hopefully this is a rare case.
> > >
> > > 26. Sounds good.
> > >
> > > Thanks!
> > > David
> > >
> > >
> > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, David,
> > > >
> > > > Thanks for the reply. A few more comments.
> > > >
> > > > 20/21. Using a tagged field in ApiVersionRequest could work. Related
> to
> > > > this, how does a KRaft controller know that it's in the dual write
> > mode?
> > > > Does it need to read the /controller path from ZK? After the
> migration,
> > > > people may have the ZK cluster decommissioned, but still have the ZK
> > > > configs left in the KRaft controller. Will this cause the KRaft
> > > controller
> > > > to be stuck because it doesn't know which mode it is in?
> > > >
> > > > 22. Using the ephemeral node matches the current ZK-based broker
> > behavior
> > > > better. However, it leaves a window for incorrect broker registration
> > to
> > > > sneak in during KRaft controller failover.
> > > >
> > > > 26. Then, we could just remove Broker Registration in that section.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > > > <david.art...@confluent.io.invalid> wrote:
> > > >
> > > > > Jun
> > > > >
> > > > > 20/21 It could definitely cause problems if we failover to a
> > controller
> > > > > without "kafka.metadata.migration.enable". The only mechanism I
> know
> > of
> > > > for
> > > > > controllers to learn things about one another is ApiVersions. We
> > > > currently
> > > > > use this for checking support for "metadata.version" (in KRaft
> mode).
> > > We
> > > > > could add a "zk.migration" feature flag that's enabled on a
> > controller
> > > > only
> > > > > if the config is set. Another possibility would be a tagged field
> on
> > > > > ApiVersionResponse that indicated if the config was set. Both seem
> > > > somewhat
> > > > > inelegant. I think a tagged field would be a bit simpler (and
> > arguably
> > > > less
> > > > > hacky).
> > > > >
> > > > > For 20, we could avoid entering the migration state unless the
> whole
> > > > quorum
> > > > > had the field present in their NodeApiVersions. For 21, we could
> > avoid
> > > > > leaving the migration state unless the whole quorum did not have
> the
> > > > field
> > > > > in their NodeApiVersions. Do you think this would be sufficient?
> > > > >
> > > > > 22. Right, we need to write the broker info back to ZK just as a
> > > > safeguard
> > > > > against incorrect broker IDs getting registered into ZK. I was
> > thinking
> > > > > these would be persistent nodes, but it's probably fine to make
> them
> > > > > ephemeral and have the active KRaft controller keep them up to
> date.
> > > > >
> > > > > 23. Right. When the broker comes up as a KRaft broker, it's old
> > > > > /brokers/ids ZNode will be gone and it will register itself with
> the
> > > > KRaft
> > > > > controller. The controller will know that it is now in KRaft mode
> and
> > > > will
> > > > > stop sending it the old RPCs.
> > > > >
> > > > > 24. Ok, I'll add these
> > > > >
> > > > > 25. I realize now the "/controller_epoch" node is already
> persistent.
> > > It
> > > > > should be sufficient to remove the "/controller" node to trigger an
> > > > > election.
> > > > >
> > > > > 26. Hmm, not sure, but I don't think it uses watches. KafkaServer
> > > > registers
> > > > > the broker info and later loads all brokers in the cluster to check
> > > that
> > > > > the endpoints don't conflict. Things that do use watches are
> dynamic
> > > > > configs, ACLs, and some others (listed in the KIP).
> > > > >
> > > > >
> > > > > On Wed, Oct 26, 2022 at 4:26 PM David Arthur <
> > > david.art...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Luke and Andrew, thanks for taking a look!
> > > > > >
> > > > > > I think the names of the state machine in the KIP aren't the
> best.
> > > I'll
> > > > > > try to improve that section in general.
> > > > > >
> > > > > > 1. "MigrationReady" is probably better named "MigratingFromZk" or
> > > > > > something. It's meant to be the state when the KRaft controller
> is
> > > > > actively
> > > > > > migrating the data out of ZK. This happens after we have detected
> > > that
> > > > > the
> > > > > > cluster is eligible and before we enter the dual write mode.
> > > > > >
> > > > > > 2. "MigrationActive" should probably be called "BrokerMigration".
> > > > > > Transitioning to "MigrationFinished" can happen automatically
> when
> > > all
> > > > > the
> > > > > > known ZK brokers have been migrated. Since we don't have
> permanent
> > > > > > registrations of ZK brokers, we can use the partition assignments
> > as
> > > a
> > > > > > proxy for this.
> > > > > >
> > > > > > 3. A metric for unready brokers makes sense. We can also log a
> > > message
> > > > on
> > > > > > the controller when it tries to start the migration
> > > > > >
> > > > > > 4. The MetadataType metric is meant to help see this. E.g., some
> > > > brokers
> > > > > > would have MetadataType=ZK, some would have MetadataType=Dual
> > > > > >
> > > > > > 5. On ZK brokers, not having this config set would prevent the
> new
> > > > broker
> > > > > > registration data from being written to ZK. This means the KRaft
> > > > > controller
> > > > > > won't send RPCs to it. If a broker has been migrated to KRaft
> > > already,
> > > > > I'm
> > > > > > not sure there is any harm in removing this config. If we decide
> > that
> > > > we
> > > > > > need to guarantee that KRaft brokers have this config set during
> > the
> > > > > > migration, we can include it in the broker registration that's
> sent
> > > to
> > > > > > KRaft. This would let the controller keep that broker fenced.
> > > > > >
> > > > > > 6. Once the KRaft controller takes leadership, the ZK controller
> > > won't
> > > > be
> > > > > > active any more and will stop reporting the MigrationState
> metric.
> > > > > >
> > > > > > 7. The "Dual" MetadataType is reported by brokers running in
> KRaft
> > > mode
> > > > > > when the migration enable config is set. I think the controller
> > > should
> > > > > also
> > > > > > report this, not just the brokers. I'll clarify in the KIP.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Andrew:
> > > > > >
> > > > > > > How will the code get all the ZooKeeper config? Will it do some
> > > sort
> > > > of
> > > > > > scan of the ZooKeeper data store? ... Do we need to do anything
> > > special
> > > > > to
> > > > > > make sure we get all data from ZooKeeper?
> > > > > >
> > > > > > With a few exceptions, all data written to ZK by Kafka happens on
> > the
> > > > > > controller (single writer principle). In our migration code, we
> can
> > > > > > unconditionally update the "/controller" and "/controller_epoch"
> > > ZNodes
> > > > > and
> > > > > > effectively force the migration component to gain leadership of
> the
> > > ZK
> > > > > > controller. Once this happens, we don't expect any data to be
> > written
> > > > to
> > > > > > ZK, so we can read it iteratively without worrying about any
> > > concurrent
> > > > > > writes.
> > > > > >
> > > > > > As for the linearizable thing, I believe this just means that
> reads
> > > may
> > > > > be
> > > > > > served by a quorum follower which has stale data. We'll be
> reading
> > > from
> > > > > ZK
> > > > > > the same way the ZK controller does, so I think we will be fine
> > > > regarding
> > > > > > consistency. If we wanted to be extra careful, we could add a
> delay
> > > > prior
> > > > > > to iterating through the znodes to give partitioned ZK followers
> a
> > > > chance
> > > > > > to get kicked out of the quorum. I don't think we'll need that
> > though
> > > > > >
> > > > > > > What happens if we commit the transaction then fail right after
> > > that
> > > > > and
> > > > > > restart?
> > > > > >
> > > > > > If we commit the "migration" transaction to the metadata log, it
> > > won't
> > > > be
> > > > > > a problem if we failover or restart. We can recover our state
> based
> > > on
> > > > > the
> > > > > > metadata log and what exists in ZK. If we see a migration
> > transaction
> > > > in
> > > > > > the kraft log, but no "/migration" ZNode, we'll know we failed
> > before
> > > > > > writing to ZK. If we see a partial transaction in the log, then
> we
> > > can
> > > > > > abort it and restart the migration.
> > > > > >
> > > > > > > This sort of leads to me wondering if we will/should check the
> > > > metadata
> > > > > > log state before doing the migration? That could also catch
> > mistakes
> > > > such
> > > > > > as if a KRaft quorum is used that already had some metadata
> which I
> > > > > assume
> > > > > > we want to prevent.
> > > > > >
> > > > > > Yes, we should check that the metadata log is empty before
> > > attempting a
> > > > > > migration (perhaps excepting the bootstrap metadata -- need to
> > think
> > > on
> > > > > > that one still).
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 24, 2022 at 8:57 AM Andrew Grant
> > > > <agr...@confluent.io.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hey David,
> > > > > >>
> > > > > >>
> > > > > >> Thanks for the KIP. I had a few small questions.
> > > > > >>
> > > > > >>
> > > > > >> “The ZK data migration will copy the existing ZK data into the
> > KRaft
> > > > > >> metadata log and establish the new KRaft active controller as
> the
> > > > active
> > > > > >> controller from a ZK perspective.”
> > > > > >>
> > > > > >> How will the code get all the ZooKeeper config? Will it do some
> > sort
> > > > of
> > > > > >> scan of the ZooKeeper data store? Also I’m not a ZooKeeper
> expert
> > > but
> > > > > per
> > > > > >>
> https://zookeeper.apache.org/doc/current/zookeeperInternals.html
> > > > “Read
> > > > > >> operations in ZooKeeper are *not linearizable* since they can
> > return
> > > > > >> potentially stale data.” Do we need to do anything special to
> make
> > > > sure
> > > > > we
> > > > > >> get all data from ZooKeeper?
> > > > > >>
> > > > > >>
> > > > > >> “For the initial migration, the controller will utilize KIP-868
> > > > Metadata
> > > > > >> Transactions
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> > > > > >> >
> > > > > >> to
> > > > > >> write all of the ZK metadata in a single transaction. If the
> > > > controller
> > > > > >> fails before this transaction is finalized, the next active
> > > controller
> > > > > >> will
> > > > > >> abort the transaction and restart the migration process.” What
> > > happens
> > > > > if
> > > > > >> we commit the transaction then fail right after that and
> restart?
> > > This
> > > > > >> sort
> > > > > >> of leads to me wondering if we will/should check the metadata
> log
> > > > state
> > > > > >> before doing the migration? That could also catch mistakes such
> as
> > > if
> > > > a
> > > > > >> KRaft quorum is used that already had some metadata which I
> assume
> > > we
> > > > > want
> > > > > >> to prevent.
> > > > > >>
> > > > > >>
> > > > > >> For the Test Plan section do you think it’s worth calling out
> > > > > performance
> > > > > >> testing migrations of ZooKeeper deployments that have “large”
> > > amounts
> > > > of
> > > > > >> metadata?
> > > > > >>
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Andrew
> > > > > >>
> > > > > >> On Mon, Oct 24, 2022 at 3:20 AM Luke Chen <show...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >> > Hi David,
> > > > > >> >
> > > > > >> > Thanks for the good and complicated proposal! :)
> > > > > >> > Some questions:
> > > > > >> >
> > > > > >> > 1. "MigrationReady" state: The KIP said:
> > > > > >> > The KRaft quorum has been started
> > > > > >> > I don't think we'll automatically enter this state after KRaft
> > > > quorum
> > > > > >> > started, do we?
> > > > > >> > I think KRaft active controller should take over leadership by
> > > > > writing a
> > > > > >> > znode in /controller path before we entering this sate, is
> that
> > > > > correct?
> > > > > >> >
> > > > > >> > 2. "MigrationActive" state: the KIP said:
> > > > > >> > ZK state has been migrated, controller is in dual-write mode,
> > > > brokers
> > > > > >> are
> > > > > >> > being restarted in KRaft mode
> > > > > >> > What confuses me is the last part: "brokers are being
> restarted
> > in
> > > > > KRaft
> > > > > >> > mode".
> > > > > >> > How could we detect brokers are being restarted in KRaft mode?
> > Old
> > > > ZK
> > > > > >> > broker is removed and new KRaft broker is up within N minutes?
> > > > > >> > I think we don't have to rely on the condition "brokers are
> > being
> > > > > >> restarted
> > > > > >> > in KRaft mode" to enter this state.
> > > > > >> > "brokers are being restarted in KRaft mode" should be a
> process
> > > > > happened
> > > > > >> > between "MigrationActive" and "MigrationFinished". Does that
> > make
> > > > > sense?
> > > > > >> >
> > > > > >> > 3. When "Zookeeper" mode trying to enter "MigrationEligible",
> if
> > > > there
> > > > > >> is a
> > > > > >> > cluster with tens of brokers, how could users know why this
> > > cluster
> > > > > >> cannot
> > > > > >> > be in "MigrationEligible" state, yet? Check that znode
> manually
> > > one
> > > > by
> > > > > >> one?
> > > > > >> > Or do we plan to create a tool to help them? Or maybe expose
> the
> > > > > >> "unready
> > > > > >> > ZK brokers" metrics?
> > > > > >> >
> > > > > >> > 4. Same for "MigrationActive" entering "MigrationFinished"
> > state.
> > > > > Since
> > > > > >> > that will be some process for restarting ZK broker into KRaft
> > > broker
> > > > > >> one by
> > > > > >> > one, could we expose the "remaining ZK brokers" as metrics?
> > > > > >> >
> > > > > >> > 5. When users are in
> > > > > >> "MigrationReady"/"MigrationActive"/"MigrationFinished"
> > > > > >> > states, and they accidentally change the KRaft controller
> > config:
> > > > > >> > "kafka.metadata.migration.enable"
> > > > > >> > to false. What will happen to this cluster? No dual-write for
> > it?
> > > > Will
> > > > > >> we
> > > > > >> > have any protection for it?
> > > > > >> >
> > > > > >> > 6. About the "MigrationState" metric:
> > > > > >> > The "ZooKeeper" and "MigrationEligible" is reported by ZK
> > > > controller,
> > > > > >> and
> > > > > >> > the rest of states are reported by KRaft controller -->  makes
> > > sense
> > > > > to
> > > > > >> me.
> > > > > >> > One question from it is, when KRaft controller takes over the
> > > > > leadership
> > > > > >> > from ZK controller, what will the "MigrationState" value in
> old
> > ZK
> > > > > >> > controller? keep in "MigrationEligible" doesn't make sense.
> Will
> > > > there
> > > > > >> be a
> > > > > >> > empty or null state?
> > > > > >> >
> > > > > >> > 7. About the "MetadataType" metric:
> > > > > >> > An enumeration of: ZooKeeper (1), Dual (2), KRaft (3). Each
> > broker
> > > > > >> reports
> > > > > >> > this.
> > > > > >> > I don't know how we could map the migration state to these 3
> > > types.
> > > > > >> > What is the metadataType when cluster in "MigrationReady"
> state?
> > > > Still
> > > > > >> > Zookeeper?
> > > > > >> > When will brokers enter Dual type?
> > > > > >> > This is unclear in the KIP.
> > > > > >> >
> > > > > >> > Thank you.
> > > > > >> > Luke
> > > > > >> >
> > > > > >> > On Thu, Oct 20, 2022 at 11:33 PM David Arthur
> > > > > >> > <david.art...@confluent.io.invalid> wrote:
> > > > > >> >
> > > > > >> > > Igor, thanks for taking a look! Since JBOD in KRaft is still
> > > under
> > > > > >> > > discussion and not likely to land before the ZK migration, I
> > > think
> > > > > >> we'll
> > > > > >> > > need to defer it. For migrating JBOD clusters from ZK to
> > KRaft,
> > > > > we'll
> > > > > >> > also
> > > > > >> > > need to address the log dir failure mechanism which
> currently
> > > > uses a
> > > > > >> > > special ZNode written to by the brokers. There is an old KIP
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > >> > > which proposes a new RPC to move that ZK write to the
> > > controller,
> > > > > but
> > > > > >> I'm
> > > > > >> > > not sure if that's the approach we'll want to take. I read
> > > through
> > > > > the
> > > > > >> > > discussion over in KIP-858 and it sounds like there are some
> > > good
> > > > > >> ideas
> > > > > >> > > there.
> > > > > >> > >
> > > > > >> > > To answer your question more directly, migrating ZK clusters
> > > using
> > > > > >> JBOD
> > > > > >> > is
> > > > > >> > > out of scope for this KIP. It might be possible to stop
> using
> > > JBOD
> > > > > >> using
> > > > > >> > > reassignments, but I'm not sure about that.
> > > > > >> > >
> > > > > >> > > -David
> > > > > >> > >
> > > > > >> > > On Tue, Oct 18, 2022 at 12:17 PM Igor Soarez <i...@soarez.me>
> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi David,
> > > > > >> > > >
> > > > > >> > > > Thanks for the KIP, this is very exciting!
> > > > > >> > > >
> > > > > >> > > > How does JBOD relate to this work? KRaft mode doesn't yet
> > > > support
> > > > > >> > > > configuring brokers with multiple log directories. If the
> > > > brokers
> > > > > in
> > > > > >> > the
> > > > > >> > > > existing cluster are configured with multiple log dirs,
> does
> > > the
> > > > > >> > > migration
> > > > > >> > > > imply that the existing brokers need to drop use of that
> > > > feature?
> > > > > >> Or is
> > > > > >> > > > there some way to upgrade them later?
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > Igor
> > > > > >> > > >
> > > > > >> > > > On Mon, Oct 17, 2022, at 10:07 PM, David Arthur wrote:
> > > > > >> > > > > I've updated the KIP with the following changes (the
> > > > confluence
> > > > > >> diff
> > > > > >> > is
> > > > > >> > > > not
> > > > > >> > > > > helpful here since i rearranged some things)
> > > > > >> > > > >
> > > > > >> > > > > * Added ZooKeeperBlockingKRaftMillis metric
> > > > > >> > > > > * Added section on new broker registration JSON
> > > > > >> > > > > * Removed section on MigrationCheck RPC
> > > > > >> > > > > * Added change to UpdateMetadataRequest
> > > > > >> > > > > * Added section "Additional ZK Broker Configs" (includes
> > > > configs
> > > > > >> to
> > > > > >> > > > connect
> > > > > >> > > > > to KRaft quorum)
> > > > > >> > > > > * Added section on "Incompatible Brokers" under Failure
> > > Modes
> > > > > >> > > > > * Clarified many things per this discussion thread
> > > > > >> > > > >
> > > > > >> > > > > I realized we need the KRaft controller to pick the
> > correct
> > > > > >> > > > > "metadata.version" when initializing the migration. I
> > > included
> > > > > the
> > > > > >> > IBP
> > > > > >> > > > of a
> > > > > >> > > > > broker in its registration data so the KRaft controller
> > can
> > > > > verify
> > > > > >> > the
> > > > > >> > > > IBP
> > > > > >> > > > > and pick the correct "metadata.version" when starting
> the
> > > > > >> migration.
> > > > > >> > > > > Otherwise, we could inadvertently downgrade the
> > > > > >> IBP/metadata.version
> > > > > >> > as
> > > > > >> > > > > part of the migration.
> > > > > >> > > > >
> > > > > >> > > > > I also added "clusterId" to the broker registration data
> > so
> > > > the
> > > > > >> KRaft
> > > > > >> > > > could
> > > > > >> > > > > verify it.
> > > > > >> > > > >
> > > > > >> > > > > -David
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Oct 17, 2022 at 12:14 PM Colin McCabe <
> > > > > cmcc...@apache.org
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> On Fri, Oct 14, 2022, at 11:39, Jun Rao wrote:
> > > > > >> > > > >> > Hi, Colin,
> > > > > >> > > > >> >
> > > > > >> > > > >> > 10. "That all goes away in the new mode, and we just
> > have
> > > > > some
> > > > > >> > code
> > > > > >> > > > which
> > > > > >> > > > >> > analyzes __cluster_metadata and reflects it in 1)
> > updates
> > > > to
> > > > > ZK
> > > > > >> > and
> > > > > >> > > 2)
> > > > > >> > > > >> > messages sent out to brokers."
> > > > > >> > > > >> > Hmm, I am not sure it's that simple. Some of the
> > > complexity
> > > > > of
> > > > > >> the
> > > > > >> > > > >> ZK-based
> > > > > >> > > > >> > controller are (1) using the pipelining approach to
> > write
> > > > to
> > > > > ZK
> > > > > >> > for
> > > > > >> > > > >> better
> > > > > >> > > > >> > throughput and using conditional writes for
> > correctness;
> > > > (2)
> > > > > >> > sending
> > > > > >> > > > the
> > > > > >> > > > >> > proper LeaderAndIsr and UpdateMetadata requests. For
> > > > example,
> > > > > >> > during
> > > > > >> > > > >> > controller failover, the full metadata needs to be
> sent
> > > > while
> > > > > >> > during
> > > > > >> > > > >> > individual broker failure, only some of the metadata
> > > needs
> > > > to
> > > > > >> be
> > > > > >> > > > updated.
> > > > > >> > > > >> > The controlled shutdown handling sometimes uses
> > > > > >> StopReplicaRequest
> > > > > >> > > > and
> > > > > >> > > > >> > some other times uses LeaderAndIsrRequest. (3)
> > triggering
> > > > new
> > > > > >> > events
> > > > > >> > > > >> based
> > > > > >> > > > >> > on the responses of LeaderAndIsr (e.g. for topic
> > > deletion).
> > > > > >> Some
> > > > > >> > of
> > > > > >> > > > those
> > > > > >> > > > >> > complexity could be re-implemented in a more
> efficient
> > > way,
> > > > > >> but we
> > > > > >> > > > need
> > > > > >> > > > >> to
> > > > > >> > > > >> > be really careful not to generate regression. Some of
> > the
> > > > > other
> > > > > >> > > > >> complexity
> > > > > >> > > > >> > just won't go away. Reimplementing all those logic
> for
> > > the
> > > > 30
> > > > > >> or
> > > > > >> > so
> > > > > >> > > > >> events
> > > > > >> > > > >> > in the ZK-based controller is possible, but seems a
> bit
> > > > > >> daunting
> > > > > >> > and
> > > > > >> > > > >> risky.
> > > > > >> > > > >>
> > > > > >> > > > >> Hi Jun,
> > > > > >> > > > >>
> > > > > >> > > > >> Thanks for the response.  I agree that there is some
> work
> > > > here
> > > > > >> but I
> > > > > >> > > > don't
> > > > > >> > > > >> think it's as bad as it might seem. Most of the code
> for
> > > > > writing
> > > > > >> to
> > > > > >> > ZK
> > > > > >> > > > can
> > > > > >> > > > >> be reused from the old controller. We should at least
> > > > evaluate
> > > > > >> using
> > > > > >> > > the
> > > > > >> > > > >> old ControllerChannelManager as well. I don't know if
> > we'll
> > > > end
> > > > > >> up
> > > > > >> > > > doing it
> > > > > >> > > > >> or not but it's a possibility. (The main reason to not
> do
> > > it
> > > > is
> > > > > >> that
> > > > > >> > > the
> > > > > >> > > > >> response handling will be a bit different)
> > > > > >> > > > >>
> > > > > >> > > > >> The question of what to do during a controller failover
> > is
> > > an
> > > > > >> > > > interesting
> > > > > >> > > > >> one. Technically, we should be able to continue sending
> > > > > >> incremental
> > > > > >> > > > updates
> > > > > >> > > > >> to the legacy nodes, for the same reason we can in
> KRaft
> > > > mode.
> > > > > >> > > However,
> > > > > >> > > > >> probably we should just copy what ZK mode does and send
> > > them
> > > > a
> > > > > >> full
> > > > > >> > > > >> metadata update. This will allow us to have an easy way
> > to
> > > > > handle
> > > > > >> > > > >> divergences caused by lost RPCs by changing the
> > controller
> > > > > (just
> > > > > >> as
> > > > > >> > we
> > > > > >> > > > do
> > > > > >> > > > >> in ZK mode, unfortunately). I suppose we should
> document
> > > this
> > > > > in
> > > > > >> the
> > > > > >> > > > KIP...
> > > > > >> > > > >>
> > > > > >> > > > >> I agree controlled shutdown is tricky to get just
> right.
> > I
> > > > > >> suppose
> > > > > >> > > this
> > > > > >> > > > is
> > > > > >> > > > >> a case where the RPCs we send out are not purely "fire
> > and
> > > > > >> forget";
> > > > > >> > we
> > > > > >> > > > have
> > > > > >> > > > >> to listen for the response. But that can be done in an
> > > > > >> event-based
> > > > > >> > > way.
> > > > > >> > > > >> Controlled shutdown is also probably the last thing
> we'll
> > > > > >> implement
> > > > > >> > > > once we
> > > > > >> > > > >> have the basic lift and shift.
> > > > > >> > > > >>
> > > > > >> > > > >> Topic deletion is another annoying thing. I wonder if
> we
> > > > could
> > > > > >> just
> > > > > >> > > > delete
> > > > > >> > > > >> immediately here and skip the complexity of
> implementing
> > > > > >> "deleting
> > > > > >> > > > state."
> > > > > >> > > > >> Topic IDs will exist in IBP 3.4, in both ZK and KRaft
> > mode,
> > > > so
> > > > > it
> > > > > >> > > > should be
> > > > > >> > > > >> OK, right?
> > > > > >> > > > >>
> > > > > >> > > > >> best,
> > > > > >> > > > >> Colin
> > > > > >> > > > >>
> > > > > >> > > > >>
> > > > > >> > > > >> >
> > > > > >> > > > >> > Thanks,
> > > > > >> > > > >> >
> > > > > >> > > > >> > Jun
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Fri, Oct 14, 2022 at 9:29 AM Colin McCabe <
> > > > > >> cmcc...@apache.org>
> > > > > >> > > > wrote:
> > > > > >> > > > >> >
> > > > > >> > > > >> >> On Thu, Oct 13, 2022, at 11:44, Jun Rao wrote:
> > > > > >> > > > >> >> > Hi, Colin,
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > Thanks for the reply.
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > 10. This is a bit on the implementation side. If
> you
> > > > look
> > > > > at
> > > > > >> > the
> > > > > >> > > > >> existing
> > > > > >> > > > >> >> > ZK-based controller, most of the logic is around
> > > > > >> maintaining an
> > > > > >> > > > >> in-memory
> > > > > >> > > > >> >> > state of all the resources (broker, topic,
> > partition,
> > > > > etc),
> > > > > >> > > > >> >> reading/writing
> > > > > >> > > > >> >> > to ZK, sending LeaderAndIsr and UpdateMetadata
> > > requests
> > > > > and
> > > > > >> > > > handling
> > > > > >> > > > >> the
> > > > > >> > > > >> >> > responses to brokers. So we need all that logic in
> > the
> > > > > dual
> > > > > >> > write
> > > > > >> > > > >> mode.
> > > > > >> > > > >> >> One
> > > > > >> > > > >> >> > option is to duplicate all that logic in some new
> > > code.
> > > > > This
> > > > > >> > can
> > > > > >> > > > be a
> > > > > >> > > > >> bit
> > > > > >> > > > >> >> > error prone and makes the code a bit harder to
> > > maintain
> > > > if
> > > > > >> we
> > > > > >> > > need
> > > > > >> > > > to
> > > > > >> > > > >> fix
> > > > > >> > > > >> >> > some critical issues in ZK-based controllers.
> > Another
> > > > > >> option is
> > > > > >> > > to
> > > > > >> > > > try
> > > > > >> > > > >> >> > reusing the existing code in the ZK-based
> > controller.
> > > > For
> > > > > >> > > example,
> > > > > >> > > > we
> > > > > >> > > > >> >> could
> > > > > >> > > > >> >> > start the EventManager in the ZK-based controller,
> > but
> > > > let
> > > > > >> the
> > > > > >> > > > KRaft
> > > > > >> > > > >> >> > controller ingest new events. This has its own
> > > > challenges:
> > > > > >> (1)
> > > > > >> > > the
> > > > > >> > > > >> >> existing
> > > > > >> > > > >> >> > logic only logs ZK failures and doesn't expose
> them
> > to
> > > > the
> > > > > >> > > caller;
> > > > > >> > > > (2)
> > > > > >> > > > >> >> the
> > > > > >> > > > >> >> > existing logic may add new events to the queue
> > itself
> > > > and
> > > > > we
> > > > > >> > > > probably
> > > > > >> > > > >> >> need
> > > > > >> > > > >> >> > to think through how this is coordinated with the
> > > KRaft
> > > > > >> > > controller;
> > > > > >> > > > >> (3)
> > > > > >> > > > >> >> it
> > > > > >> > > > >> >> > registers some ZK listeners unnecessarily (may not
> > be
> > > a
> > > > > big
> > > > > >> > > > concern).
> > > > > >> > > > >> So
> > > > > >> > > > >> >> we
> > > > > >> > > > >> >> > need to get around those issues somehow. I am
> > > wondering
> > > > if
> > > > > >> we
> > > > > >> > > have
> > > > > >> > > > >> >> > considered both options and which approach we are
> > > > leaning
> > > > > >> > towards
> > > > > >> > > > for
> > > > > >> > > > >> the
> > > > > >> > > > >> >> > implementation.
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >>
> > > > > >> > > > >> >> Yes, this is a good question. My take is that a big
> > part
> > > > of
> > > > > >> the
> > > > > >> > > > >> complexity
> > > > > >> > > > >> >> in the old controller code results from the fact
> that
> > we
> > > > use
> > > > > >> ZK
> > > > > >> > as
> > > > > >> > > a
> > > > > >> > > > >> >> multi-writer database for propagating information
> > > between
> > > > > >> > different
> > > > > >> > > > >> >> components. So in the old controller, every write to
> > ZK
> > > > > needs
> > > > > >> to
> > > > > >> > be
> > > > > >> > > > >> >> structured as a compare and swap to be fully
> correct.
> > > > Every
> > > > > >> time
> > > > > >> > we
> > > > > >> > > > get
> > > > > >> > > > >> >> notified about something, it's usually in the form
> of
> > > > "this
> > > > > >> znode
> > > > > >> > > > >> changed"
> > > > > >> > > > >> >> which prompts a full reload of part of the data in
> ZK
> > > > (which
> > > > > >> > itself
> > > > > >> > > > has
> > > > > >> > > > >> >> multiple parts, loading, deserializing, reconciling,
> > > etc.)
> > > > > >> That
> > > > > >> > all
> > > > > >> > > > goes
> > > > > >> > > > >> >> away in the new mode, and we just have some code
> which
> > > > > >> analyzes
> > > > > >> > > > >> >> __cluster_metadata and reflects it in 1) updates to
> ZK
> > > and
> > > > > 2)
> > > > > >> > > > messages
> > > > > >> > > > >> sent
> > > > > >> > > > >> >> out to brokers.
> > > > > >> > > > >> >>
> > > > > >> > > > >> >> This is pretty decoupled from the other logic in
> > > > > >> QuorumController
> > > > > >> > > and
> > > > > >> > > > >> >> should be easy to unit test, since the same inputs
> > from
> > > > the
> > > > > >> log
> > > > > >> > > > always
> > > > > >> > > > >> >> produce the same output in ZK. Basically, ZK is
> > > write-only
> > > > > for
> > > > > >> > us,
> > > > > >> > > > we do
> > > > > >> > > > >> >> not read it (with the big exception of broker
> > > registration
> > > > > >> > znodes)
> > > > > >> > > > and I
> > > > > >> > > > >> >> think that will greatly simplify things.
> > > > > >> > > > >> >>
> > > > > >> > > > >> >> So I think dual-write mode as described here will be
> > > > > >> > substantially
> > > > > >> > > > >> simpler
> > > > > >> > > > >> >> than trying to run part or all of the old controller
> > in
> > > > > >> > parallel. I
> > > > > >> > > > do
> > > > > >> > > > >> >> think we will reuse a bunch of the serialization /
> > > > > >> > deserialization
> > > > > >> > > > code
> > > > > >> > > > >> for
> > > > > >> > > > >> >> znodes and possibly the code for communicating with
> > ZK.
> > > > > >> > > > >> >>
> > > > > >> > > > >> >> best,
> > > > > >> > > > >> >> Colin
> > > > > >> > > > >> >>
> > > > > >> > > > >> >>
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > 14. Good point and make sense.
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > Thanks,
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > Jun
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> > On Wed, Oct 12, 2022 at 3:27 PM Colin McCabe <
> > > > > >> > cmcc...@apache.org
> > > > > >> > > >
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> >> >
> > > > > >> > > > >> >> >> Hi Jun,
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> Thanks for taking a look. I can answer some
> > questions
> > > > > here
> > > > > >> > > > because I
> > > > > >> > > > >> >> >> collaborated on this a bit, and David is on
> > vacation
> > > > for
> > > > > a
> > > > > >> few
> > > > > >> > > > days.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> On Wed, Oct 12, 2022, at 14:41, Jun Rao wrote:
> > > > > >> > > > >> >> >> > Hi, David,
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > Thanks for the KIP. A few comments below.
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > 10. It's still not very clear to me how the
> KRaft
> > > > > >> controller
> > > > > >> > > > works
> > > > > >> > > > >> in
> > > > > >> > > > >> >> the
> > > > > >> > > > >> >> >> > dual writes mode to KRaft log and ZK when the
> > > brokers
> > > > > >> still
> > > > > >> > > run
> > > > > >> > > > in
> > > > > >> > > > >> ZK
> > > > > >> > > > >> >> >> mode.
> > > > > >> > > > >> >> >> > Does the KRaft controller run a ZK based
> > controller
> > > > in
> > > > > >> > > parallel
> > > > > >> > > > or
> > > > > >> > > > >> do
> > > > > >> > > > >> >> we
> > > > > >> > > > >> >> >> > derive what needs to be written to ZK based on
> > > KRaft
> > > > > >> > > controller
> > > > > >> > > > >> logic?
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> We derive what needs to be written to ZK based on
> > > KRaft
> > > > > >> > > controller
> > > > > >> > > > >> >> logic.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> > I am also not sure how the KRaft controller
> > handles
> > > > > >> broker
> > > > > >> > > > >> >> >> > registration/deregistration, since brokers are
> > > still
> > > > > >> running
> > > > > >> > > in
> > > > > >> > > > ZK
> > > > > >> > > > >> >> mode
> > > > > >> > > > >> >> >> and
> > > > > >> > > > >> >> >> > are not heartbeating to the KRaft controller.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> The new controller will listen for broker
> > > registrations
> > > > > >> under
> > > > > >> > > > >> /brokers.
> > > > > >> > > > >> >> >> This is the only znode watch that the new
> > controller
> > > > will
> > > > > >> do.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> We did consider changing how ZK-based broker
> > > > registration
> > > > > >> > > worked,
> > > > > >> > > > >> but it
> > > > > >> > > > >> >> >> just ended up being too much work for not enough
> > > gain.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > 12. "A new set of nodes will be provisioned to
> > host
> > > > the
> > > > > >> > > > controller
> > > > > >> > > > >> >> >> quorum."
> > > > > >> > > > >> >> >> > I guess we don't support starting the KRaft
> > > > controller
> > > > > >> > quorum
> > > > > >> > > on
> > > > > >> > > > >> >> existing
> > > > > >> > > > >> >> >> > brokers. It would be useful to make that clear.
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> Agreed
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> > 13. "Once the quorum is established and a
> leader
> > is
> > > > > >> elected,
> > > > > >> > > the
> > > > > >> > > > >> >> >> controller
> > > > > >> > > > >> >> >> > will check the state of the cluster using the
> > > > > >> MigrationCheck
> > > > > >> > > > RPC."
> > > > > >> > > > >> How
> > > > > >> > > > >> >> >> does
> > > > > >> > > > >> >> >> > the quorum controller detect other brokers?
> Does
> > > the
> > > > > >> > > controller
> > > > > >> > > > >> node
> > > > > >> > > > >> >> need
> > > > > >> > > > >> >> >> > to be configured with ZK connection string? If
> > so,
> > > it
> > > > > >> would
> > > > > >> > be
> > > > > >> > > > >> useful
> > > > > >> > > > >> >> to
> > > > > >> > > > >> >> >> > document the additional configs that the quorum
> > > > > >> controller
> > > > > >> > > > needs to
> > > > > >> > > > >> >> set.
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> Yes, the controllers monitor ZK for broker
> > > > registrations,
> > > > > >> as I
> > > > > >> > > > >> mentioned
> > > > > >> > > > >> >> >> above. So they need zk.connect and the other ZK
> > > > > connection
> > > > > >> > > > >> >> configurations.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> > 14. "In order to prevent further writes to ZK,
> > the
> > > > > first
> > > > > >> > thing
> > > > > >> > > > the
> > > > > >> > > > >> new
> > > > > >> > > > >> >> >> > KRaft quorum must do is take over leadership of
> > the
> > > > ZK
> > > > > >> > > > controller.
> > > > > >> > > > >> "
> > > > > >> > > > >> >> The
> > > > > >> > > > >> >> >> ZK
> > > > > >> > > > >> >> >> > controller processing changes to /controller
> > update
> > > > > >> > > > asynchronously.
> > > > > >> > > > >> >> How
> > > > > >> > > > >> >> >> > does the KRaft controller know when the ZK
> > > controller
> > > > > has
> > > > > >> > > > resigned
> > > > > >> > > > >> >> before
> > > > > >> > > > >> >> >> > it can safely copy the ZK data?
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> This should be done through
> > > > > >> expectedControllerEpochZkVersion,
> > > > > >> > > just
> > > > > >> > > > >> like
> > > > > >> > > > >> >> in
> > > > > >> > > > >> >> >> ZK mode, right? We should bump this epoch value
> so
> > > that
> > > > > any
> > > > > >> > > writes
> > > > > >> > > > >> from
> > > > > >> > > > >> >> the
> > > > > >> > > > >> >> >> old controller will not go through. I agree we
> > should
> > > > > spell
> > > > > >> > this
> > > > > >> > > > out
> > > > > >> > > > >> in
> > > > > >> > > > >> >> the
> > > > > >> > > > >> >> >> KIP.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> > 15. We have the following sentences. One says
> > > > > >> ControllerId
> > > > > >> > is
> > > > > >> > > a
> > > > > >> > > > >> random
> > > > > >> > > > >> >> >> > KRaft broker and the other says it's the active
> > > > > >> controller.
> > > > > >> > > > Which
> > > > > >> > > > >> one
> > > > > >> > > > >> >> is
> > > > > >> > > > >> >> >> > correct?
> > > > > >> > > > >> >> >> > "UpdateMetadata: for certain metadata changes,
> > the
> > > > > KRaft
> > > > > >> > > > controller
> > > > > >> > > > >> >> will
> > > > > >> > > > >> >> >> > need to send UpdateMetadataRequests to the ZK
> > > > brokers.
> > > > > >> For
> > > > > >> > the
> > > > > >> > > > >> >> >> > “ControllerId” field in this request, the
> > > controller
> > > > > >> should
> > > > > >> > > > >> specify a
> > > > > >> > > > >> >> >> > random KRaft broker."
> > > > > >> > > > >> >> >> > "In the UpdateMetadataRequest sent by the KRaft
> > > > > >> controller
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > >> ZK
> > > > > >> > > > >> >> >> > brokers, the ControllerId will point to the
> > active
> > > > > >> > controller
> > > > > >> > > > which
> > > > > >> > > > >> >> will
> > > > > >> > > > >> >> >> be
> > > > > >> > > > >> >> >> > used for the inter-broker requests."
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> Yeah, this seems like an error to me as well. A
> > > random
> > > > > >> value
> > > > > >> > is
> > > > > >> > > > not
> > > > > >> > > > >> >> really
> > > > > >> > > > >> >> >> useful. Plus the text here is self-contradictory,
> > as
> > > > you
> > > > > >> > pointed
> > > > > >> > > > out.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> I suspect what we should do here is add a new
> > field,
> > > > > >> > > > >> KRaftControllerId,
> > > > > >> > > > >> >> >> and populate it with the real controller ID, and
> > > leave
> > > > > the
> > > > > >> old
> > > > > >> > > > >> >> controllerId
> > > > > >> > > > >> >> >> field as -1. A ZK-based broker that sees this can
> > > then
> > > > > >> consult
> > > > > >> > > its
> > > > > >> > > > >> >> >> controller.quorum.voters configuration to see
> where
> > > it
> > > > > >> should
> > > > > >> > > send
> > > > > >> > > > >> >> >> controller-bound RPCs. That (static)
> configuration
> > > lets
> > > > > us
> > > > > >> map
> > > > > >> > > > >> between
> > > > > >> > > > >> >> >> controller ID and host:port.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> We should still keep our existing epoch logic for
> > > > > deciding
> > > > > >> > when
> > > > > >> > > > >> >> >> UpdateMetadataRequest / LeaderAndIsrRequests are
> > > stale,
> > > > > >> with
> > > > > >> > the
> > > > > >> > > > >> caveat
> > > > > >> > > > >> >> >> that any kraft-based epoch should be treated as
> > > greater
> > > > > >> than
> > > > > >> > any
> > > > > >> > > > >> >> ZK-based
> > > > > >> > > > >> >> >> epoch. After all, the kraft epoch is coming from
> > the
> > > > > epoch
> > > > > >> of
> > > > > >> > > > >> >> >> __cluster_metadata, whereas the ZK epoch comes
> from
> > > ZK.
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > 16. "Additionally, the controller must specify
> > if a
> > > > > >> broker
> > > > > >> > in
> > > > > >> > > > >> >> >> “LiveBrokers”
> > > > > >> > > > >> >> >> > is KRaft or ZK." Does that require any protocol
> > > > changes
> > > > > >> to
> > > > > >> > > > >> >> >> UpdateMetadata?
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> Yeah, I am also curious why the we need to care
> > > whether
> > > > > >> > brokers
> > > > > >> > > > are
> > > > > >> > > > >> ZK
> > > > > >> > > > >> >> or
> > > > > >> > > > >> >> >> KRaft in UpdateMetadataRequest. We don't reveal
> > this
> > > to
> > > > > >> > clients,
> > > > > >> > > > so
> > > > > >> > > > >> can
> > > > > >> > > > >> >> we
> > > > > >> > > > >> >> >> just leave this out?
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> best,
> > > > > >> > > > >> >> >> Colin
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >> >> > Thanks,
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > Jun
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > On Wed, Oct 5, 2022 at 10:07 AM Mickael Maison
> <
> > > > > >> > > > >> >> mickael.mai...@gmail.com
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> > wrote:
> > > > > >> > > > >> >> >> >
> > > > > >> > > > >> >> >> >> Hi David,
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> Thanks for starting this important KIP.
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> I've just taken a quick look so far but I've
> > got a
> > > > > >> couple
> > > > > >> > of
> > > > > >> > > > >> initial
> > > > > >> > > > >> >> >> >> questions:
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> 1) What happens if a non KRaft compatible
> broker
> > > (or
> > > > > >> with
> > > > > >> > > > >> >> >> >> kafka.metadata.migration.enable set to false)
> > > joins
> > > > > the
> > > > > >> > > cluster
> > > > > >> > > > >> after
> > > > > >> > > > >> >> >> >> the migration is triggered?
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> 2) In the Failure Modes section you mention a
> > > > scenario
> > > > > >> > where
> > > > > >> > > a
> > > > > >> > > > >> write
> > > > > >> > > > >> >> >> >> to ZK fails. What happens when the divergence
> > > limit
> > > > is
> > > > > >> > > > reached? Is
> > > > > >> > > > >> >> >> >> this a fatal condition? How much divergence
> > should
> > > > we
> > > > > >> > allow?
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> Thanks,
> > > > > >> > > > >> >> >> >> Mickael
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >> >> On Wed, Oct 5, 2022 at 12:20 AM David Arthur <
> > > > > >> > > mum...@gmail.com
> > > > > >> > > > >
> > > > > >> > > > >> >> wrote:
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >> > Hey folks, I wanted to get the ball rolling
> on
> > > the
> > > > > >> > > discussion
> > > > > >> > > > >> for
> > > > > >> > > > >> >> the
> > > > > >> > > > >> >> >> >> > ZooKeeper migration KIP. This KIP details
> how
> > we
> > > > > plan
> > > > > >> to
> > > > > >> > do
> > > > > >> > > > an
> > > > > >> > > > >> >> online
> > > > > >> > > > >> >> >> >> > migration of metadata from ZooKeeper to
> KRaft
> > as
> > > > > well
> > > > > >> as
> > > > > >> > a
> > > > > >> > > > >> rolling
> > > > > >> > > > >> >> >> >> > upgrade of brokers to KRaft mode.
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >> > The general idea is to keep KRaft and
> > ZooKeeper
> > > in
> > > > > >> sync
> > > > > >> > > > during
> > > > > >> > > > >> the
> > > > > >> > > > >> >> >> >> > migration, so both types of brokers can
> exist
> > > > > >> > > simultaneously.
> > > > > >> > > > >> Then,
> > > > > >> > > > >> >> >> >> > once everything is migrated and updated, we
> > can
> > > > turn
> > > > > >> off
> > > > > >> > > > >> ZooKeeper
> > > > > >> > > > >> >> >> >> > writes.
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >> > This is a pretty complex KIP, so please
> take a
> > > > look
> > > > > :)
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >>
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > > > > >> > > > >> >> >> >> >
> > > > > >> > > > >> >> >> >> > Thanks!
> > > > > >> > > > >> >> >> >> > David
> > > > > >> > > > >> >> >> >>
> > > > > >> > > > >> >> >>
> > > > > >> > > > >> >>
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > -David
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > -David
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -David
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -David
> > > > >
> > > >
> > >
> > >
> > > --
> > > -David
> > >
> >
>
>
> --
> -David
>

Re: [DISCUSS] KIP-866 ZooKeeper to KRaft Migration

Reply via email to