Ah, sorry about that, you're right. Since we won't support ZK
migrations in combined mode, is this issue avoided?

Essentially, we would only set ZkMigrationReady in ApiVersionsResponse if

* process.roles=controller
* kafka.metadata.migration.enabled=true

Also, the following would be an invalid config that should prevent startup:

* process.roles=broker,controller
* kafka.metadata.migration.enabled=true

Does this seem reasonable?

Thanks!
David

On Tue, Nov 8, 2022 at 11:12 AM Jun Rao <j...@confluent.io.invalid> wrote:
>
> Hi, David,
>
> I am not sure that we are fully settled on the following.
>
> 20/21. Since separate listeners are optional, it seems that the broker
> can't distinguish between ApiVersion requests coming from the client or
> other brokers. This means the clients will get ZkMigrationReady in the
> ApiVersion response, which is weird.
>
> Thanks,
>
> Jun
>
> On Tue, Nov 8, 2022 at 7:18 AM David Arthur <mum...@gmail.com> wrote:
>
> > Thanks for the discussion everyone, I'm going to move ahead with the
> > vote for this KIP.
> >
> > -David
> >
> > On Thu, Nov 3, 2022 at 1:20 PM Jun Rao <j...@confluent.io.invalid> wrote:
> > >
> > > Hi, David,
> > >
> > > Thanks for the reply.
> > >
> > > 20/21 Yes, but separate listeners are optional. It's possible for the
> > nodes
> > > to use a single port for both client and server side communications.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 3, 2022 at 9:59 AM David Arthur
> > > <david.art...@confluent.io.invalid> wrote:
> > >
> > > > 20/21, in combined mode we still have a separate listener for the
> > > > controller APIs, e.g.,
> > > >
> > > > listeners=PLAINTEXT://:9092,CONTROLLER://:9093
> > > >
> > > > inter.broker.listener.name=PLAINTEXT
> > > >
> > > > controller.listener.names=CONTROLLER
> > > >
> > > > advertised.listeners=PLAINTEXT://localhost:9092
> > > >
> > > >
> > > >
> > > > Clients still talk to the broker through the advertised listener, and
> > only
> > > > brokers and other controllers will communicate over the controller
> > > > listener.
> > > >
> > > > 40. Sounds good, I updated the KIP
> > > >
> > > > Thanks!
> > > > David
> > > >
> > > >
> > > >
> > > > On Thu, Nov 3, 2022 at 12:14 PM Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 20/21. When KRaft runs in the combined mode, does a controller know
> > > > whether
> > > > > an ApiRequest is from a client or another broker?
> > > > >
> > > > > 40. Adding a "None" state sounds reasonable.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Nov 3, 2022 at 8:39 AM David Arthur
> > > > > <david.art...@confluent.io.invalid> wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > 20/21 If we use a tagged field, then I don't think clients need to
> > be
> > > > > > concerned with it, right?. In ApiVersionsResponse sent by brokers
> > to
> > > > > > clients, this field would be omitted. Clients can't connect
> > directly to
> > > > > the
> > > > > > KRaft controller nodes. Also, we have a precedent of sending
> > controller
> > > > > > node state between controllers through ApiVersions
> > ("metadata.version"
> > > > > > feature), so I think it fits well.
> > > > > >
> > > > > > 40. For new KRaft clusters, we could add another state to indicate
> > it
> > > > was
> > > > > > not migrated from ZK, but created as a KRaft cluster. Maybe
> > > > > > "kafka.controller:type=KafkaController,name=MigrationState" =>
> > "None" ?
> > > > > We
> > > > > > could also omit that metric for unmigrated clusters, but I'm not a
> > fan
> > > > of
> > > > > > using the absence of a value to signal something.
> > > > > >
> > > > > > -----
> > > > > >
> > > > > > Akhilesh, thanks for reviewing the KIP!
> > > > > >
> > > > > > 1. MigrationState and MetadataType are mostly the same on the
> > > > controller,
> > > > > > but we do have the "MigratingZkData" state that seems useful to
> > report
> > > > > as a
> > > > > > metric. Aside from looking at logs, observing the controller in
> > this
> > > > > state
> > > > > > is the only way to see how long its taking to copy data from ZK.
> > > > > >
> > > > > > "KRaftMode" instead of "MigrationFinalized" is similar to Jun's
> > > > question
> > > > > > about non-migrated clusters. I think it's useful to have a distinct
> > > > > > MigrationState for clusters that have been migrated and those that
> > were
> > > > > > never migrated. This does mean we'll report the MigrationState long
> > > > after
> > > > > > the migration is complete, but we can drop these metrics in 4.0
> > once ZK
> > > > > is
> > > > > > removed.
> > > > > >
> > > > > > 2. The "ZkMigrationReady" will indicate that the controller has
> > > > > > "kafka.metadata.migration.enable" _and_ the ZK configs set. We need
> > > > some
> > > > > > way to indicate that the whole quorum is correctly configured to
> > handle
> > > > > the
> > > > > > migration so we don't failover to a controller that's not
> > configured
> > > > for
> > > > > > ZK. Did I understand your question correctly?
> > > > > >
> > > > > > 3. Yea, good idea. While the KRaft controller has
> > > > > > MigrationState=MigrationIneligible, we could also report
> > > > > >
> > > > >
> > > >
> > "kafka.controller:type=KafkaController,name=MigrationInelgibleBrokerCount".
> > > > > > It might be useful to report ineligible controllers as well since
> > that
> > > > > can
> > > > > > prevent the migration from starting.
> > > > > >
> > > > > > 4. I think I covered this in "Incompatible Brokers". We effectively
> > > > fence
> > > > > > these brokers by not sending them metadata RPCs.
> > > > > >
> > > > > > Thanks!
> > > > > > David
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 1, 2022 at 3:58 PM Akhilesh Chaganti <
> > > > akhilesh....@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the KIP. I have some questions/suggestions.
> > > > > > >
> > > > > > >
> > > > > > > 1) I see two new metrics:
> > > > > > > kafka.controller:type=KafkaController,name=MetadataType and
> > > > > > > kafka.controller:type=KafkaController,name=MigrationState. Won't
> > the
> > > > > > second
> > > > > > > metric already cover the cases of the first metric? Also,
> > instead of
> > > > > > > MigrationFinalized, we could directly say the state is
> > KRaftMode. So
> > > > we
> > > > > > can
> > > > > > > use the same value for default KRaft clusters.
> > > > > > >
> > > > > > >
> > > > > > > 2) ZkMigrationReady in ApiVersionsResponse from KRaft
> > Controller. By
> > > > > > > default, we plan to start the Controller quorum in "
> > > > > > > *kafka.metadata.migration.enable*" config set to true. Then do we
> > > > need
> > > > > > this
> > > > > > > additional information again to make sure The controllers are
> > ready
> > > > for
> > > > > > > migration? What would happen if the Controller assumes it is
> > ready
> > > > for
> > > > > > > migration from 3.4 by default if it doesn't see both
> > > > MigrationMetadata
> > > > > > > records?
> > > > > > >
> > > > > > >
> > > > > > > 3) I see that we do not impose order on rolling the brokers with
> > > > > > migration
> > > > > > > flags and provisioning the controller quorum. Along with the
> > KRaft
> > > > > > > controller emitting migration state metrics, it may be better to
> > emit
> > > > > the
> > > > > > > broker count for the brokers not ready for migration yet. This
> > will
> > > > > give
> > > > > > us
> > > > > > > more insight into any roll issues.
> > > > > > >
> > > > > > >
> > > > > > > 4) Once the KRaft controller is in migration mode, we should also
> > > > > > > prevent/handle ZkBrokerRegistrations that don't enable migration
> > > > mode.
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > Akhilesh
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Nov 1, 2022 at 10:49 AM Jun Rao <j...@confluent.io.invalid
> > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, David,
> > > > > > > >
> > > > > > > > Thanks for the reply.
> > > > > > > >
> > > > > > > > 20/21. Regarding the new ZkMigrationReady field in
> > > > > ApiVersionsResponse,
> > > > > > > it
> > > > > > > > seems that this is a bit intrusive since it exposes unneeded
> > info
> > > > to
> > > > > > the
> > > > > > > > clients. Another option is to add that field as part of the
> > Fetch
> > > > > > > request.
> > > > > > > > We can choose to only set that field in the very first Fetch
> > > > request
> > > > > > > from a
> > > > > > > > Quorum follower.
> > > > > > > >
> > > > > > > > 40. For
> > kafka.controller:type=KafkaController,name=MigrationState,
> > > > > what
> > > > > > > is
> > > > > > > > the value for a brand new KRaft cluster?
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Mon, Oct 31, 2022 at 2:35 PM David Arthur
> > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > >
> > > > > > > > > 30. I think we can keep the single ControllerId field in
> > those
> > > > > > requests
> > > > > > > > > since they are only used for fencing (as far as I know).
> > > > > Internally,
> > > > > > > the
> > > > > > > > > broker components that handle those requests will compare the
> > > > > > > > ControllerId
> > > > > > > > > with that of MetadataCache (which is updated via UMR).
> > > > > > > > >
> > > > > > > > > The reason we need the separate KRaftControllerId in the
> > > > > > UpdateMetadata
> > > > > > > > > code path so that we can have different connection behavior
> > for a
> > > > > > KRaft
> > > > > > > > > controller vs ZK controller.
> > > > > > > > >
> > > > > > > > > 31. It seems reasonable to keep the MigrationRecord in the
> > > > > snapshot.
> > > > > > I
> > > > > > > > was
> > > > > > > > > thinking the same thing in terms of understanding the loss
> > for a
> > > > > > > > > migration-after-finalization. However, once a snapshot has
> > been
> > > > > taken
> > > > > > > > that
> > > > > > > > > includes the final MigrationRecord, we can't easily see which
> > > > > records
> > > > > > > > came
> > > > > > > > > after it.
> > > > > > > > >
> > > > > > > > > 32. You're correct, we can just use the modify time from the
> > > > Stat.
> > > > > > The
> > > > > > > > > other two fields are primarily informational and are there
> > for
> > > > > > > operators
> > > > > > > > > who want to inspect the state of the migration. They aren't
> > > > > required
> > > > > > > for
> > > > > > > > > correctness
> > > > > > > > >
> > > > > > > > > 33. Yes that's right. I detail that in "Controller
> > Leadership"
> > > > > > section
> > > > > > > > >
> > > > > > > > > 34. Right, I'll fix that.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > David
> > > > > > > > >
> > > > > > > > > On Mon, Oct 31, 2022 at 2:55 PM Jun Rao
> > <j...@confluent.io.invalid
> > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, David,
> > > > > > > > > >
> > > > > > > > > > Thanks for the updated KIP. A few more comments.
> > > > > > > > > >
> > > > > > > > > > 30. LeaderAndIsrRequest/StopReplicaRequest both have a
> > > > > controllerId
> > > > > > > > > field.
> > > > > > > > > > Should we add a KRaftControllerId field like
> > UpdateMetadata?
> > > > > > > > > >
> > > > > > > > > > 31. "If a migration has been finalized, but the KRaft
> > quroum
> > > > > comes
> > > > > > up
> > > > > > > > > with
> > > > > > > > > > kafka.metadata.migration.enable, we must not re-enter the
> > > > > migration
> > > > > > > > mode.
> > > > > > > > > > In this case, while replaying the log, the controller can
> > see
> > > > the
> > > > > > > > second
> > > > > > > > > > MigrationRecord and know that the migration is finalized
> > and
> > > > > should
> > > > > > > not
> > > > > > > > > be
> > > > > > > > > > resumed. " Hmm, do we want to keep the MigrationRecord in
> > the
> > > > > > > snapshot
> > > > > > > > > and
> > > > > > > > > > the metadata log forever after migration is finalized? If
> > not,
> > > > we
> > > > > > > can't
> > > > > > > > > > know for sure whether a migration has happened or not.
> > Also, it
> > > > > > might
> > > > > > > > be
> > > > > > > > > > useful to support switching back to ZK mode after the
> > migration
> > > > > is
> > > > > > > > > > finalized, with the understanding of potential metadata
> > loss.
> > > > In
> > > > > > that
> > > > > > > > > case,
> > > > > > > > > > we could just trim all metadata log and recopy the ZK
> > metadata
> > > > > > back.
> > > > > > > > > >
> > > > > > > > > > 32. The /migration node in ZK: Do we need
> > last_update_time_ms
> > > > > since
> > > > > > > ZK
> > > > > > > > > > Stats already has an MTime? Also, how do we plan to use the
> > > > > > > > > > kraft_controller_id and kraft_controller_epoch fields?
> > > > > > > > > >
> > > > > > > > > > 33. Controller migration: We will force a write to the
> > > > > > "/controller"
> > > > > > > > and
> > > > > > > > > > "/controller_epoch" ZNodes before copying ZK data, right?
> > > > > > > > > >
> > > > > > > > > > 34. "Operator can remove the persistent "/controller" and
> > > > > > > > > > "/controller_epoch" nodes allowing for ZK controller
> > election
> > > > to
> > > > > > take
> > > > > > > > > > place". I guess the operator only needs to remove the
> > > > /controller
> > > > > > > path?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 31, 2022 at 7:17 AM David Arthur
> > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > > > Happy Monday, everyone! I've updated the KIP with the
> > > > following
> > > > > > > > > changes:
> > > > > > > > > > >
> > > > > > > > > > > * Clarified MetadataType metric usages (broker vs
> > controller)
> > > > > > > > > > > * Added ZkMigrationReady tagged field to
> > ApiVersionsResponse
> > > > > (for
> > > > > > > use
> > > > > > > > > by
> > > > > > > > > > > KRaft controller quorum)
> > > > > > > > > > > * Added MigrationRecord with two states: Started and
> > Finished
> > > > > > > > > > > * Documented ZK configs for KRaft controller
> > > > > > > > > > > * Simplified state machine description (internally, more
> > > > states
> > > > > > > will
> > > > > > > > > > exist,
> > > > > > > > > > > but only the four documented are interesting to
> > operators)
> > > > > > > > > > > * Clarified some things in Controller Migration section
> > > > > > > > > > > * Removed KRaft -> ZK parts of Broker Registration
> > > > > > > > > > > * Added Misconfigurations section to Failure Modes
> > > > > > > > > > >
> > > > > > > > > > > Let me know if I've missed anything from the past two
> > weeks
> > > > of
> > > > > > > > > > discussion.
> > > > > > > > > > >
> > > > > > > > > > > Thanks again to everyone who has reviewed this KIP so
> > far!
> > > > > > > > > > >
> > > > > > > > > > > -David
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 28, 2022 at 2:26 PM Jun Rao
> > > > > <j...@confluent.io.invalid
> > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, David,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply.
> > > > > > > > > > > >
> > > > > > > > > > > > 20/21. Sounds good.
> > > > > > > > > > > >
> > > > > > > > > > > > Could you update the doc with all the changes being
> > > > > discussed?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Oct 28, 2022 at 10:11 AM David Arthur
> > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Jun,
> > > > > > > > > > > > >
> > > > > > > > > > > > > 20/21. I was also wondering about a "migration"
> > record.
> > > > In
> > > > > > > > addition
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > scenario you mentioned, we also need a way to
> > prevent the
> > > > > > > cluster
> > > > > > > > > > from
> > > > > > > > > > > > > re-entering the dual write mode after the migration
> > has
> > > > > been
> > > > > > > > > > > finalized. I
> > > > > > > > > > > > > could see this happening inadvertently via a change
> > in
> > > > some
> > > > > > > > > > > configuration
> > > > > > > > > > > > > management system. How about we add a record that
> > marks
> > > > the
> > > > > > > > > beginning
> > > > > > > > > > > and
> > > > > > > > > > > > > end of the dual-write mode. The first occurrence of
> > the
> > > > > > record
> > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > > > > included in the metadata transaction when we migrate
> > data
> > > > > > from
> > > > > > > > ZK.
> > > > > > > > > > > > >
> > > > > > > > > > > > > With this, the active controller would decide
> > whether to
> > > > > > enter
> > > > > > > > dual
> > > > > > > > > > > write
> > > > > > > > > > > > > mode, finalize the migration based, or fail based on:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Metadata log state
> > > > > > > > > > > > > * It's own configuration
> > > > > ("kafka.metadata.migration.enable",
> > > > > > > > > > > > > "zookeeper.connect", etc)
> > > > > > > > > > > > > * The other controllers configuration (via
> > > > > > ApiVersionsResponse)
> > > > > > > > > > > > >
> > > > > > > > > > > > > WDYT?
> > > > > > > > > > > > >
> > > > > > > > > > > > > 22. Since we will need the fencing anyways as a
> > > > safe-guard,
> > > > > > > then
> > > > > > > > I
> > > > > > > > > > > agree
> > > > > > > > > > > > > would could skip the registration of KRaft brokers
> > in ZK
> > > > to
> > > > > > > > simply
> > > > > > > > > > > > things a
> > > > > > > > > > > > > bit.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > David
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 27, 2022 at 5:11 PM Jun Rao
> > > > > > > <j...@confluent.io.invalid
> > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi, David,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the reply.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 20/21. Relying upon the presence of ZK configs to
> > > > > determine
> > > > > > > > > whether
> > > > > > > > > > > the
> > > > > > > > > > > > > > KRaft controller is in a dual write mode seems a
> > bit
> > > > > error
> > > > > > > > prone.
> > > > > > > > > > If
> > > > > > > > > > > > > > someone accidentally adds a ZK configuration to a
> > brand
> > > > > new
> > > > > > > > KRaft
> > > > > > > > > > > > > cluster,
> > > > > > > > > > > > > > ideally it shouldn't cause the controller to get
> > into a
> > > > > > weird
> > > > > > > > > > state.
> > > > > > > > > > > > Have
> > > > > > > > > > > > > > we considered storing the migration state in a
> > metadata
> > > > > > > record?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 22. If we have the broker fencing logic, do we
> > need to
> > > > > > write
> > > > > > > > the
> > > > > > > > > > > broker
> > > > > > > > > > > > > > registration path in ZK for KRaft brokers at all?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:02 PM David Arthur
> > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jun,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 20/21. A KRaft controller will recover the
> > migration
> > > > > > state
> > > > > > > by
> > > > > > > > > > > reading
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > "/migration" ZNode. If the migration enable
> > config is
> > > > > > set,
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > > ZK
> > > > > > > > > > > > > > > migration is complete, it will enter the
> > dual-write
> > > > > mode.
> > > > > > > > > Before
> > > > > > > > > > an
> > > > > > > > > > > > > > > operator can decommission ZK, they will need to
> > > > > finalize
> > > > > > > the
> > > > > > > > > > > > migration
> > > > > > > > > > > > > > > which involves removing the migration config and
> > the
> > > > ZK
> > > > > > > > config.
> > > > > > > > > > > I'll
> > > > > > > > > > > > > > > clarify this in the KIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 22. Yea, we could see an incorrect broker ID
> > during
> > > > > that
> > > > > > > > > window.
> > > > > > > > > > > If
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > ended up with a state where we saw a ZK broker ID
> > > > that
> > > > > > > > > conflicted
> > > > > > > > > > > > with
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > KRaft broker ID, we would need to fence one of
> > them.
> > > > I
> > > > > > > would
> > > > > > > > > > > probably
> > > > > > > > > > > > > opt
> > > > > > > > > > > > > > > to fence the KRaft broker in that case since
> > broker
> > > > > > > > > registration
> > > > > > > > > > > and
> > > > > > > > > > > > > > > fencing is more robust in KRaft. Hopefully this
> > is a
> > > > > rare
> > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 26. Sounds good.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > > > David
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Oct 27, 2022 at 1:34 PM Jun Rao
> > > > > > > > > <j...@confluent.io.invalid
> > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi, David,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the reply. A few more comments.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 20/21. Using a tagged field in
> > ApiVersionRequest
> > > > > could
> > > > > > > > work.
> > > > > > > > > > > > Related
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > this, how does a KRaft controller know that
> > it's in
> > > > > the
> > > > > > > > dual
> > > > > > > > > > > write
> > > > > > > > > > > > > > mode?
> > > > > > > > > > > > > > > > Does it need to read the /controller path from
> > ZK?
> > > > > > After
> > > > > > > > the
> > > > > > > > > > > > > migration,
> > > > > > > > > > > > > > > > people may have the ZK cluster decommissioned,
> > but
> > > > > > still
> > > > > > > > have
> > > > > > > > > > the
> > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > configs left in the KRaft controller. Will this
> > > > cause
> > > > > > the
> > > > > > > > > KRaft
> > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > to be stuck because it doesn't know which mode
> > it
> > > > is
> > > > > > in?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 22. Using the ephemeral node matches the
> > current
> > > > > > ZK-based
> > > > > > > > > > broker
> > > > > > > > > > > > > > behavior
> > > > > > > > > > > > > > > > better. However, it leaves a window for
> > incorrect
> > > > > > broker
> > > > > > > > > > > > registration
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > sneak in during KRaft controller failover.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 26. Then, we could just remove Broker
> > Registration
> > > > in
> > > > > > > that
> > > > > > > > > > > section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 2:21 PM David Arthur
> > > > > > > > > > > > > > > > <david.art...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 20/21 It could definitely cause problems if
> > we
> > > > > > failover
> > > > > > > > to
> > > > > > > > > a
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > without "kafka.metadata.migration.enable".
> > The
> > > > only
> > > > > > > > > > mechanism I
> > > > > > > > > > > > > know
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > controllers to learn things about one
> > another is
> > > > > > > > > ApiVersions.
> > > > > > > > > > > We
> > > > > > > > > > > > > > > > currently
> > > > > > > > > > > > > > > > > use this for checking support for
> > > > > "metadata.version"
> > > > > > > (in
> > > > > > > > > > KRaft
> > > > > > > > > > > > > mode).
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > could add a "zk.migration" feature flag
> > that's
> > > > > > enabled
> > > > > > > > on a
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > if the config is set. Another possibility
> > would
> > > > be
> > > > > a
> > > > > > > > tagged
> > > > > > > > > > > field
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > ApiVersionResponse that indicated if the
> > config
> > > > was
> > > > > > > set.
> > > > > > > > > Both
> > > > > > > > > > > > seem
> > > > > > > > > > > > > > > > somewhat
> > > > > > > > > > > > > > > > > inelegant. I think a tagged field would be a
> > bit
> > > > > > > simpler
> > > > > > > > > (and
> > > > > > > > > > > > > > arguably
> > > > > > > > > > > > > > > > less
> > > > > > > > > > > > > > > > > hacky).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > For 20, we could avoid entering the migration
> > > > state
> > > > > > > > unless
> > > > > > > > > > the
> > > > > > > > > > > > > whole
> > > > > > > > > > > > > > > > quorum
> > > > > > > > > > > > > > > > > had the field present in their
> > NodeApiVersions.
> > > > For
> > > > > > 21,
> > > > > > > > we
> > > > > > > > > > > could
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > leaving the migration state unless the whole
> > > > quorum
> > > > > > did
> > > > > > > > not
> > > > > > > > > > > have
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > in their NodeApiVersions. Do you think this
> > would
> > > > > be
> > > > > > > > > > > sufficient?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 22. Right, we need to write the broker info
> > back
> > > > to
> > > > > > ZK
> > > > > > > > just
> > > > > > > > > > as
> > > > > > > > > > > a
> > > > > > > > > > > > > > > > safeguard
> > > > > > > > > > > > > > > > > against incorrect broker IDs getting
> > registered
> > > > > into
> > > > > > > ZK.
> > > > > > > > I
> > > > > > > > > > was
> > > > > > > > > > > > > > thinking
> > > > > > > > > > > > > > > > > these would be persistent nodes, but it's
> > > > probably
> > > > > > fine
> > > > > > > > to
> > > > > > > > > > make
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > > ephemeral and have the active KRaft
> > controller
> > > > keep
> > > > > > > them
> > > > > > > > up
> > > > > > > > > > to
> > > > > > > > > > > > > date.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 23. Right. When the broker comes up as a
> > KRaft
> > > > > > broker,
> > > > > > > > it's
> > > > > > > > > > old
> > > > > > > > > > > > > > > > > /brokers/ids ZNode will be gone and it will
> > > > > register
> > > > > > > > itself
> > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > controller. The controller will know that it
> > is
> > > > now
> > > > > > in
> > > > > > > > > KRaft
> > > > > > > > > > > mode
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > stop sending it the old RPCs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 24. Ok, I'll add these
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 25. I realize now the "/controller_epoch"
> > node is
> > > > > > > already
> > > > > > > > > > > > > persistent.
> > > > > > > > > > > > > > > It
> > > > > > > > > > > > > > > > > should be sufficient to remove the
> > "/controller"
> > > > > node
> > > > > > > to
> > > > > > > > > > > trigger
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > election.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 26. Hmm, not sure, but I don't think it uses
> > > > > watches.
> > > > > > > > > > > KafkaServer
> > > > > > > > > > > > > > > > registers
> > > > > > > > > > > > > > > > > the broker info and later loads all brokers
> > in
> > > > the
> > > > > > > > cluster
> > > > > > > > > to
> > > > > > > > > > > > check
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > the endpoints don't conflict. Things that do
> > use
> > > > > > > watches
> > > > > > > > > are
> > > > > > > > > > > > > dynamic
> > > > > > > > > > > > > > > > > configs, ACLs, and some others (listed in the
> > > > KIP).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Oct 26, 2022 at 4:26 PM David Arthur
> > <
> > > > > > > > > > > > > > > david.art...@confluent.io>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Luke and Andrew, thanks for taking a look!
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think the names of the state machine in
> > the
> > > > KIP
> > > > > > > > aren't
> > > > > > > > > > the
> > > > > > > > > > > > > best.
> > > > > > > > > > > > > > > I'll
> > > > > > > > > > > > > > > > > > try to improve that section in general.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. "MigrationReady" is probably better
> > named
> > > > > > > > > > > "MigratingFromZk"
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > something. It's meant to be the state when
> > the
> > > > > > KRaft
> > > > > > > > > > > controller
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > actively
> > > > > > > > > > > > > > > > > > migrating the data out of ZK. This happens
> > > > after
> > > > > we
> > > > > > > > have
> > > > > > > > > > > > detected
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > cluster is eligible and before we enter the
> > > > dual
> > > > > > > write
> > > > > > > > > > mode.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. "MigrationActive" should probably be
> > called
> > > > > > > > > > > > "BrokerMigration".
> > > > > > > > > > > > > > > > > > Transitioning to "MigrationFinished" can
> > happen
> > > > > > > > > > automatically
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > known ZK brokers have been migrated. Since
> > we
> > > > > don't
> > > > > > > > have
> > > > > > > > > > > > > permanent
> > > > > > > > > > > > > > > > > > registrations of ZK brokers, we can use the
> > > > > > partition
> > > > > > > > > > > > assignments
> > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > proxy for this.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3. A metric for unready brokers makes
> > sense. We
> > > > > can
> > > > > > > > also
> > > > > > > > > > log
> > > > > > > > > > > a
> > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > the controller when it tries to start the
> > > > > migration
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 4. The MetadataType metric is meant to
> > help see
> > > > > > this.
> > > > > > > > > E.g.,
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > > would have MetadataType=ZK, some would have
> > > > > > > > > > MetadataType=Dual
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 5. On ZK brokers, not having this config
> > set
> > > > > would
> > > > > > > > > prevent
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > registration data from being written to ZK.
> > > > This
> > > > > > > means
> > > > > > > > > the
> > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > > won't send RPCs to it. If a broker has been
> > > > > > migrated
> > > > > > > to
> > > > > > > > > > KRaft
> > > > > > > > > > > > > > > already,
> > > > > > > > > > > > > > > > > I'm
> > > > > > > > > > > > > > > > > > not sure there is any harm in removing this
> > > > > config.
> > > > > > > If
> > > > > > > > we
> > > > > > > > > > > > decide
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > need to guarantee that KRaft brokers have
> > this
> > > > > > config
> > > > > > > > set
> > > > > > > > > > > > during
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > migration, we can include it in the broker
> > > > > > > registration
> > > > > > > > > > > that's
> > > > > > > > > > > > > sent
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > KRaft. This would let the controller keep
> > that
> > > > > > broker
> > > > > > > > > > fenced.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 6. Once the KRaft controller takes
> > leadership,
> > > > > the
> > > > > > ZK
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > active any more and will stop reporting the
> > > > > > > > > MigrationState
> > > > > > > > > > > > > metric.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 7. The "Dual" MetadataType is reported by
> > > > brokers
> > > > > > > > running
> > > > > > > > > > in
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > > > when the migration enable config is set. I
> > > > think
> > > > > > the
> > > > > > > > > > > controller
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > report this, not just the brokers. I'll
> > clarify
> > > > > in
> > > > > > > the
> > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Andrew:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > How will the code get all the ZooKeeper
> > > > config?
> > > > > > > Will
> > > > > > > > it
> > > > > > > > > > do
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > sort
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > scan of the ZooKeeper data store? ... Do we
> > > > need
> > > > > to
> > > > > > > do
> > > > > > > > > > > anything
> > > > > > > > > > > > > > > special
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > make sure we get all data from ZooKeeper?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > With a few exceptions, all data written to
> > ZK
> > > > by
> > > > > > > Kafka
> > > > > > > > > > > happens
> > > > > > > > > > > > on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > controller (single writer principle). In
> > our
> > > > > > > migration
> > > > > > > > > > code,
> > > > > > > > > > > we
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > unconditionally update the "/controller"
> > and
> > > > > > > > > > > > "/controller_epoch"
> > > > > > > > > > > > > > > ZNodes
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > effectively force the migration component
> > to
> > > > gain
> > > > > > > > > > leadership
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > > controller. Once this happens, we don't
> > expect
> > > > > any
> > > > > > > data
> > > > > > > > > to
> > > > > > > > > > be
> > > > > > > > > > > > > > written
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > ZK, so we can read it iteratively without
> > > > > worrying
> > > > > > > > about
> > > > > > > > > > any
> > > > > > > > > > > > > > > concurrent
> > > > > > > > > > > > > > > > > > writes.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > As for the linearizable thing, I believe
> > this
> > > > > just
> > > > > > > > means
> > > > > > > > > > that
> > > > > > > > > > > > > reads
> > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > served by a quorum follower which has stale
> > > > data.
> > > > > > > We'll
> > > > > > > > > be
> > > > > > > > > > > > > reading
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > > the same way the ZK controller does, so I
> > think
> > > > > we
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > fine
> > > > > > > > > > > > > > > > regarding
> > > > > > > > > > > > > > > > > > consistency. If we wanted to be extra
> > careful,
> > > > we
> > > > > > > could
> > > > > > > > > > add a
> > > > > > > > > > > > > delay
> > > > > > > > > > > > > > > > prior
> > > > > > > > > > > > > > > > > > to iterating through the znodes to give
> > > > > partitioned
> > > > > > > ZK
> > > > > > > > > > > > followers
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > chance
> > > > > > > > > > > > > > > > > > to get kicked out of the quorum. I don't
> > think
> > > > > > we'll
> > > > > > > > need
> > > > > > > > > > > that
> > > > > > > > > > > > > > though
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > What happens if we commit the transaction
> > > > then
> > > > > > fail
> > > > > > > > > right
> > > > > > > > > > > > after
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > restart?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > If we commit the "migration" transaction
> > to the
> > > > > > > > metadata
> > > > > > > > > > log,
> > > > > > > > > > > > it
> > > > > > > > > > > > > > > won't
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > a problem if we failover or restart. We can
> > > > > recover
> > > > > > > our
> > > > > > > > > > state
> > > > > > > > > > > > > based
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > metadata log and what exists in ZK. If we
> > see a
> > > > > > > > migration
> > > > > > > > > > > > > > transaction
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the kraft log, but no "/migration" ZNode,
> > we'll
> > > > > > know
> > > > > > > we
> > > > > > > > > > > failed
> > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > > > > writing to ZK. If we see a partial
> > transaction
> > > > in
> > > > > > the
> > > > > > > > > log,
> > > > > > > > > > > then
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > abort it and restart the migration.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This sort of leads to me wondering if we
> > > > > > > will/should
> > > > > > > > > > check
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > log state before doing the migration? That
> > > > could
> > > > > > also
> > > > > > > > > catch
> > > > > > > > > > > > > > mistakes
> > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > as if a KRaft quorum is used that already
> > had
> > > > > some
> > > > > > > > > metadata
> > > > > > > > > > > > > which I
> > > > > > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > > > > we want to prevent.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Yes, we should check that the metadata log
> > is
> > > > > empty
> > > > > > > > > before
> > > > > > > > > > > > > > > attempting a
> > > > > > > > > > > > > > > > > > migration (perhaps excepting the bootstrap
> > > > > metadata
> > > > > > > --
> > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > that one still).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Oct 24, 2022 at 8:57 AM Andrew
> > Grant
> > > > > > > > > > > > > > > > <agr...@confluent.io.invalid
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >> Hey David,
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Thanks for the KIP. I had a few small
> > > > questions.
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> “The ZK data migration will copy the
> > existing
> > > > ZK
> > > > > > > data
> > > > > > > > > into
> > > > > > > > > > > the
> > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> metadata log and establish the new KRaft
> > > > active
> > > > > > > > > controller
> > > > > > > > > > > as
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > active
> > > > > > > > > > > > > > > > > >> controller from a ZK perspective.”
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> How will the code get all the ZooKeeper
> > > > config?
> > > > > > Will
> > > > > > > > it
> > > > > > > > > do
> > > > > > > > > > > > some
> > > > > > > > > > > > > > sort
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > >> scan of the ZooKeeper data store? Also I’m
> > > > not a
> > > > > > > > > ZooKeeper
> > > > > > > > > > > > > expert
> > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > per
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > https://zookeeper.apache.org/doc/current/zookeeperInternals.html
> > > > > > > > > > > > > > > > “Read
> > > > > > > > > > > > > > > > > >> operations in ZooKeeper are *not
> > linearizable*
> > > > > > since
> > > > > > > > > they
> > > > > > > > > > > can
> > > > > > > > > > > > > > return
> > > > > > > > > > > > > > > > > >> potentially stale data.” Do we need to do
> > > > > anything
> > > > > > > > > special
> > > > > > > > > > > to
> > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > >> get all data from ZooKeeper?
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> “For the initial migration, the controller
> > > > will
> > > > > > > > utilize
> > > > > > > > > > > > KIP-868
> > > > > > > > > > > > > > > > Metadata
> > > > > > > > > > > > > > > > > >> Transactions
> > > > > > > > > > > > > > > > > >> <
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-868+Metadata+Transactions
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> write all of the ZK metadata in a single
> > > > > > > transaction.
> > > > > > > > If
> > > > > > > > > > the
> > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > >> fails before this transaction is
> > finalized,
> > > > the
> > > > > > next
> > > > > > > > > > active
> > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > >> will
> > > > > > > > > > > > > > > > > >> abort the transaction and restart the
> > > > migration
> > > > > > > > > process.”
> > > > > > > > > > > What
> > > > > > > > > > > > > > > happens
> > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > >> we commit the transaction then fail right
> > > > after
> > > > > > that
> > > > > > > > and
> > > > > > > > > > > > > restart?
> > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > >> sort
> > > > > > > > > > > > > > > > > >> of leads to me wondering if we will/should
> > > > check
> > > > > > the
> > > > > > > > > > > metadata
> > > > > > > > > > > > > log
> > > > > > > > > > > > > > > > state
> > > > > > > > > > > > > > > > > >> before doing the migration? That could
> > also
> > > > > catch
> > > > > > > > > mistakes
> > > > > > > > > > > > such
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > >> KRaft quorum is used that already had some
> > > > > > metadata
> > > > > > > > > which
> > > > > > > > > > I
> > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > >> to prevent.
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> For the Test Plan section do you think
> > it’s
> > > > > worth
> > > > > > > > > calling
> > > > > > > > > > > out
> > > > > > > > > > > > > > > > > performance
> > > > > > > > > > > > > > > > > >> testing migrations of ZooKeeper
> > deployments
> > > > that
> > > > > > > have
> > > > > > > > > > > “large”
> > > > > > > > > > > > > > > amounts
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > >> metadata?
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> Andrew
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> On Mon, Oct 24, 2022 at 3:20 AM Luke Chen
> > <
> > > > > > > > > > > show...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >> > Hi David,
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Thanks for the good and complicated
> > > > proposal!
> > > > > :)
> > > > > > > > > > > > > > > > > >> > Some questions:
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 1. "MigrationReady" state: The KIP said:
> > > > > > > > > > > > > > > > > >> > The KRaft quorum has been started
> > > > > > > > > > > > > > > > > >> > I don't think we'll automatically enter
> > this
> > > > > > state
> > > > > > > > > after
> > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > quorum
> > > > > > > > > > > > > > > > > >> > started, do we?
> > > > > > > > > > > > > > > > > >> > I think KRaft active controller should
> > take
> > > > > over
> > > > > > > > > > > leadership
> > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > writing a
> > > > > > > > > > > > > > > > > >> > znode in /controller path before we
> > entering
> > > > > > this
> > > > > > > > > sate,
> > > > > > > > > > is
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > correct?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 2. "MigrationActive" state: the KIP
> > said:
> > > > > > > > > > > > > > > > > >> > ZK state has been migrated, controller
> > is in
> > > > > > > > > dual-write
> > > > > > > > > > > > mode,
> > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > >> are
> > > > > > > > > > > > > > > > > >> > being restarted in KRaft mode
> > > > > > > > > > > > > > > > > >> > What confuses me is the last part:
> > "brokers
> > > > > are
> > > > > > > > being
> > > > > > > > > > > > > restarted
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> > mode".
> > > > > > > > > > > > > > > > > >> > How could we detect brokers are being
> > > > > restarted
> > > > > > in
> > > > > > > > > KRaft
> > > > > > > > > > > > mode?
> > > > > > > > > > > > > > Old
> > > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > >> > broker is removed and new KRaft broker
> > is up
> > > > > > > within
> > > > > > > > N
> > > > > > > > > > > > minutes?
> > > > > > > > > > > > > > > > > >> > I think we don't have to rely on the
> > > > condition
> > > > > > > > > "brokers
> > > > > > > > > > > are
> > > > > > > > > > > > > > being
> > > > > > > > > > > > > > > > > >> restarted
> > > > > > > > > > > > > > > > > >> > in KRaft mode" to enter this state.
> > > > > > > > > > > > > > > > > >> > "brokers are being restarted in KRaft
> > mode"
> > > > > > should
> > > > > > > > be
> > > > > > > > > a
> > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > > happened
> > > > > > > > > > > > > > > > > >> > between "MigrationActive" and
> > > > > > "MigrationFinished".
> > > > > > > > > Does
> > > > > > > > > > > that
> > > > > > > > > > > > > > make
> > > > > > > > > > > > > > > > > sense?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 3. When "Zookeeper" mode trying to enter
> > > > > > > > > > > > "MigrationEligible",
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > > > >> is a
> > > > > > > > > > > > > > > > > >> > cluster with tens of brokers, how could
> > > > users
> > > > > > know
> > > > > > > > why
> > > > > > > > > > > this
> > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > >> cannot
> > > > > > > > > > > > > > > > > >> > be in "MigrationEligible" state, yet?
> > Check
> > > > > that
> > > > > > > > znode
> > > > > > > > > > > > > manually
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > >> one?
> > > > > > > > > > > > > > > > > >> > Or do we plan to create a tool to help
> > them?
> > > > > Or
> > > > > > > > maybe
> > > > > > > > > > > expose
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> "unready
> > > > > > > > > > > > > > > > > >> > ZK brokers" metrics?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 4. Same for "MigrationActive" entering
> > > > > > > > > > "MigrationFinished"
> > > > > > > > > > > > > > state.
> > > > > > > > > > > > > > > > > Since
> > > > > > > > > > > > > > > > > >> > that will be some process for
> > restarting ZK
> > > > > > broker
> > > > > > > > > into
> > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > >> one by
> > > > > > > > > > > > > > > > > >> > one, could we expose the "remaining ZK
> > > > > brokers"
> > > > > > as
> > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 5. When users are in
> > > > > > > > > > > > > > > > > >>
> > > > > > > "MigrationReady"/"MigrationActive"/"MigrationFinished"
> > > > > > > > > > > > > > > > > >> > states, and they accidentally change the
> > > > KRaft
> > > > > > > > > > controller
> > > > > > > > > > > > > > config:
> > > > > > > > > > > > > > > > > >> > "kafka.metadata.migration.enable"
> > > > > > > > > > > > > > > > > >> > to false. What will happen to this
> > cluster?
> > > > No
> > > > > > > > > > dual-write
> > > > > > > > > > > > for
> > > > > > > > > > > > > > it?
> > > > > > > > > > > > > > > > Will
> > > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > > >> > have any protection for it?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 6. About the "MigrationState" metric:
> > > > > > > > > > > > > > > > > >> > The "ZooKeeper" and "MigrationEligible"
> > is
> > > > > > > reported
> > > > > > > > by
> > > > > > > > > > ZK
> > > > > > > > > > > > > > > > controller,
> > > > > > > > > > > > > > > > > >> and
> > > > > > > > > > > > > > > > > >> > the rest of states are reported by KRaft
> > > > > > > controller
> > > > > > > > > -->
> > > > > > > > > > > > makes
> > > > > > > > > > > > > > > sense
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > >> me.
> > > > > > > > > > > > > > > > > >> > One question from it is, when KRaft
> > > > controller
> > > > > > > takes
> > > > > > > > > > over
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > leadership
> > > > > > > > > > > > > > > > > >> > from ZK controller, what will the
> > > > > > "MigrationState"
> > > > > > > > > value
> > > > > > > > > > > in
> > > > > > > > > > > > > old
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > >> > controller? keep in "MigrationEligible"
> > > > > doesn't
> > > > > > > make
> > > > > > > > > > > sense.
> > > > > > > > > > > > > Will
> > > > > > > > > > > > > > > > there
> > > > > > > > > > > > > > > > > >> be a
> > > > > > > > > > > > > > > > > >> > empty or null state?
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > 7. About the "MetadataType" metric:
> > > > > > > > > > > > > > > > > >> > An enumeration of: ZooKeeper (1), Dual
> > (2),
> > > > > > KRaft
> > > > > > > > (3).
> > > > > > > > > > > Each
> > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > >> reports
> > > > > > > > > > > > > > > > > >> > this.
> > > > > > > > > > > > > > > > > >> > I don't know how we could map the
> > migration
> > > > > > state
> > > > > > > to
> > > > > > > > > > > these 3
> > > > > > > > > > > > > > > types.
> > > > > > > > > > > > > > > > > >> > What is the metadataType when cluster in
> > > > > > > > > > "MigrationReady"
> > > > > > > > > > > > > state?
> > > > > > > > > > > > > > > > Still
> > > > > > > > > > > > > > > > > >> > Zookeeper?
> > > > > > > > > > > > > > > > > >> > When will brokers enter Dual type?
> > > > > > > > > > > > > > > > > >> > This is unclear in the KIP.
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > Thank you.
> > > > > > > > > > > > > > > > > >> > Luke
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > On Thu, Oct 20, 2022 at 11:33 PM David
> > > > Arthur
> > > > > > > > > > > > > > > > > >> > <david.art...@confluent.io.invalid>
> > wrote:
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > > Igor, thanks for taking a look! Since
> > JBOD
> > > > > in
> > > > > > > > KRaft
> > > > > > > > > is
> > > > > > > > > > > > still
> > > > > > > > > > > > > > > under
> > > > > > > > > > > > > > > > > >> > > discussion and not likely to land
> > before
> > > > the
> > > > > > ZK
> > > > > > > > > > > > migration, I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > >> we'll
> > > > > > > > > > > > > > > > > >> > > need to defer it. For migrating JBOD
> > > > > clusters
> > > > > > > from
> > > > > > > > > ZK
> > > > > > > > > > to
> > > > > > > > > > > > > > KRaft,
> > > > > > > > > > > > > > > > > we'll
> > > > > > > > > > > > > > > > > >> > also
> > > > > > > > > > > > > > > > > >> > > need to address the log dir failure
> > > > > mechanism
> > > > > > > > which
> > > > > > > > > > > > > currently
> > > > > > > > > > > > > > > > uses a
> > > > > > > > > > > > > > > > > >> > > special ZNode written to by the
> > brokers.
> > > > > There
> > > > > > > is
> > > > > > > > an
> > > > > > > > > > old
> > > > > > > > > > > > KIP
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > > > > > > > > > > > > > >> > > which proposes a new RPC to move that
> > ZK
> > > > > write
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > > > controller,
> > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > >> I'm
> > > > > > > > > > > > > > > > > >> > > not sure if that's the approach we'll
> > want
> > > > > to
> > > > > > > > take.
> > > > > > > > > I
> > > > > > > > > > > read
> > > > > > > > > > > > > > > through
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > discussion over in KIP-858 and it
> > sounds
> > > > > like
> > > > > > > > there
> > > > > > > > > > are
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > >> ideas
> > > > > > > > > > > > > > > > > >> > > there.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > To answer your question more directly,
> > > > > > migrating
> > > > > > > > ZK
> > > > > > > > > > > > clusters
> > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > >> JBOD
> > > > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > > > >> > > out of scope for this KIP. It might be
> > > > > > possible
> > > > > > > to
> > > > > > > > > > stop
> > > > > > > > > > > > > using
> > > > > > > > > > > > > > > JBOD
> > > > > > > > > > > > > > > > > >> using
> > > > > > > > > > > > > > > > > >> > > reassignments, but I'm not sure about
> > > > that.
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > -David
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > On Tue, Oct 18, 2022 at 12:17 PM Igor
> > > > > Soarez <
> > > > > > > > > > > i...@soarez.me
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > > Hi David,
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Thanks for the KIP, this is very
> > > > exciting!
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > How does JBOD relate to this work?
> > KRaft
> > > > > > mode
> > > > > > > > > > doesn't
> > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > >> > > > configuring brokers with multiple
> > log
> > > > > > > > directories.
> > > > > > > > > > If
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > > > >> > > > existing cluster are configured with
> > > > > > multiple
> > > > > > > > log
> > > > > > > > > > > dirs,
> > > > > > > > > > > > > does
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > migration
> > > > > > > > > > > > > > > > > >> > > > imply that the existing brokers
> > need to
> > > > > drop
> > > > > > > use
> > > > > > > > > of
> > > > > > > > > > > that
> > > > > > > > > > > > > > > > feature?
> > > > > > > > > > > > > > > > > >> Or is
> > > > > > > > > > > > > > > > > >> > > > there some way to upgrade them
> > later?
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > > > > >> > > > Igor
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > On Mon, Oct 17, 2022, at 10:07 PM,
> > David
> > > > > > > Arthur
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > > I've updated the KIP with the
> > > > following
> > > > > > > > changes
> > > > > > > > > > (the
> > > > > > > > > > > > > > > > confluence
> > > > > > > > > > > > > > > > > >> diff
> > > > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > > > >> > > > not
> > > > > > > > > > > > > > > > > >> > > > > helpful here since i rearranged
> > some
> > > > > > things)
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > * Added
> > ZooKeeperBlockingKRaftMillis
> > > > > > metric
> > > > > > > > > > > > > > > > > >> > > > > * Added section on new broker
> > > > > registration
> > > > > > > > JSON
> > > > > > > > > > > > > > > > > >> > > > > * Removed section on
> > MigrationCheck
> > > > RPC
> > > > > > > > > > > > > > > > > >> > > > > * Added change to
> > > > UpdateMetadataRequest
> > > > > > > > > > > > > > > > > >> > > > > * Added section "Additional ZK
> > Broker
> > > > > > > Configs"
> > > > > > > > > > > > (includes
> > > > > > > > > > > > > > > > configs
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> > > > connect
> > > > > > > > > > > > > > > > > >> > > > > to KRaft quorum)
> > > > > > > > > > > > > > > > > >> > > > > * Added section on "Incompatible
> > > > > Brokers"
> > > > > > > > under
> > > > > > > > > > > > Failure
> > > > > > > > > > > > > > > Modes
> > > > > > > > > > > > > > > > > >> > > > > * Clarified many things per this
> > > > > > discussion
> > > > > > > > > thread
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > I realized we need the KRaft
> > > > controller
> > > > > to
> > > > > > > > pick
> > > > > > > > > > the
> > > > > > > > > > > > > > correct
> > > > > > > > > > > > > > > > > >> > > > > "metadata.version" when
> > initializing
> > > > the
> > > > > > > > > > migration.
> > > > > > > > > > > I
> > > > > > > > > > > > > > > included
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > IBP
> > > > > > > > > > > > > > > > > >> > > > of a
> > > > > > > > > > > > > > > > > >> > > > > broker in its registration data
> > so the
> > > > > > KRaft
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > verify
> > > > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > > > >> > > > IBP
> > > > > > > > > > > > > > > > > >> > > > > and pick the correct
> > > > "metadata.version"
> > > > > > when
> > > > > > > > > > > starting
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> migration.
> > > > > > > > > > > > > > > > > >> > > > > Otherwise, we could inadvertently
> > > > > > downgrade
> > > > > > > > the
> > > > > > > > > > > > > > > > > >> IBP/metadata.version
> > > > > > > > > > > > > > > > > >> > as
> > > > > > > > > > > > > > > > > >> > > > > part of the migration.
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > I also added "clusterId" to the
> > broker
> > > > > > > > > > registration
> > > > > > > > > > > > data
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> KRaft
> > > > > > > > > > > > > > > > > >> > > > could
> > > > > > > > > > > > > > > > > >> > > > > verify it.
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > -David
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > On Mon, Oct 17, 2022 at 12:14 PM
> > Colin
> > > > > > > McCabe
> > > > > > > > <
> > > > > > > > > > > > > > > > > cmcc...@apache.org
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > >> On Fri, Oct 14, 2022, at 11:39,
> > Jun
> > > > Rao
> > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > >> > Hi, Colin,
> > > > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > > > >> > > > >> > 10. "That all goes away in the
> > new
> > > > > > mode,
> > > > > > > > and
> > > > > > > > > we
> > > > > > > > > > > > just
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > >> > code
> > > > > > > > > > > > > > > > > >> > > > which
> > > > > > > > > > > > > > > > > >> > > > >> > analyzes __cluster_metadata and
> > > > > > reflects
> > > > > > > it
> > > > > > > > > in
> > > > > > > > > > 1)
> > > > > > > > > > > > > > updates
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > >> > and
> > > > > > > > > > > > > > > > > >> > > 2)
> > > > > > > > > > > > > > > > > >> > > > >> > messages sent out to brokers."
> > > > > > > > > > > > > > > > > >> > > > >> > Hmm, I am not sure it's that
> > > > simple.
> > > > > > Some
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > > complexity
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > > > >> > > > >> ZK-based
> > > > > > > > > > > > > > > > > >> > > > >> > controller are (1) using the
> > > > > pipelining
> > > > > > > > > > approach
> > > > > > > > > > > to
> > > > > > > > > > > > > > write
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > >> > for
> > > > > > > > > > > > > > > > > >> > > > >> better
> > > > > > > > > > > > > > > > > >> > > > >> > throughput and using
> > conditional
> > > > > writes
> > > > > > > for
> > > > > > > > > > > > > > correctness;
> > > > > > > > > > > > > > > > (2)
> > > > > > > > > > > > > > > > > >> > sending
> > > > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > > > >> > > > >> > proper LeaderAndIsr and
> > > > > UpdateMetadata
> > > > > > > > > > requests.
> > > > > > > > > > > > For
> > > > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > > > > > >> > > > >> > controller failover, the full
> > > > > metadata
> > > > > > > > needs
> > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > sent
> > > > > > > > > > > > > > > > while
> > > > > > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > > > > > >> > > > >> > individual broker failure, only
> > > > some
> > > > > of
> > > > > > > the
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > needs
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > >> be
> > > > > > > > > > > > > > > > > >> > > > updated.
> > > > > > > > > > > > > > > > > >> > > > >> > The controlled shutdown
> > handling
> > > > > > > sometimes
> > > > > > > > > uses
> > > > > > > > > > > > > > > > > >> StopReplicaRequest
> > > > > > > > > > > > > > > > > >> > > > and
> > > > > > > > > > > > > > > > > >> > > > >> > some other times uses
> > > > > > > LeaderAndIsrRequest.
> > > > > > > > > (3)
> > > > > > > > > > > > > > triggering
> > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > >> > events
> > > > > > > > > > > > > > > > > >> > > > >> based
> > > > > > > > > > > > > > > > > >> > > > >> > on the responses of
> > LeaderAndIsr
> > > > > (e.g.
> > > > > > > for
> > > > > > > > > > topic
> > > > > > > > > > > > > > > deletion).
> > > > > > > > > > > > > > > > > >> Some
> > > > > > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > > > > > >> > > > those
> > > > > > > > > > > > > > > > > >> > > > >> > complexity could be
> > re-implemented
> > > > > in a
> > > > > > > > more
> > > > > > > > > > > > > efficient
> > > > > > > > > > > > > > > way,
> > > > > > > > > > > > > > > > > >> but we
> > > > > > > > > > > > > > > > > >> > > > need
> > > > > > > > > > > > > > > > > >> > > > >> to
> > > > > > > > > > > > > > > > > >> > > > >> > be really careful not to
> > generate
> > > > > > > > regression.
> > > > > > > > > > > Some
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > >> > > > >> complexity
> > > > > > > > > > > > > > > > > >> > > > >> > just won't go away.
> > Reimplementing
> > > > > all
> > > > > > > > those
> > > > > > > > > > > logic
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > 30
> > > > > > > > > > > > > > > > > >> or
> > > > > > > > > > > > > > > > > >> > so
> > > > > > > > > > > > > > > > > >> > > > >> events
> > > > > > > > > > > > > > > > > >> > > > >> > in the ZK-based controller is
> > > > > possible,
> > > > > > > but
> > > > > > > > > > > seems a
> > > > > > > > > > > > > bit
> > > > > > > > > > > > > > > > > >> daunting
> > > > > > > > > > > > > > > > > >> > and
> > > > > > > > > > > > > > > > > >> > > > >> risky.
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> Hi Jun,
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> Thanks for the response.  I agree
> > > > that
> > > > > > > there
> > > > > > > > is
> > > > > > > > > > > some
> > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > > > >> but I
> > > > > > > > > > > > > > > > > >> > > > don't
> > > > > > > > > > > > > > > > > >> > > > >> think it's as bad as it might
> > seem.
> > > > > Most
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > code
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > writing
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> > ZK
> > > > > > > > > > > > > > > > > >> > > > can
> > > > > > > > > > > > > > > > > >> > > > >> be reused from the old
> > controller. We
> > > > > > > should
> > > > > > > > at
> > > > > > > > > > > least
> > > > > > > > > > > > > > > > evaluate
> > > > > > > > > > > > > > > > > >> using
> > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > >> > > > >> old ControllerChannelManager as
> > > > well. I
> > > > > > > don't
> > > > > > > > > > know
> > > > > > > > > > > if
> > > > > > > > > > > > > > we'll
> > > > > > > > > > > > > > > > end
> > > > > > > > > > > > > > > > > >> up
> > > > > > > > > > > > > > > > > >> > > > doing it
> > > > > > > > > > > > > > > > > >> > > > >> or not but it's a possibility.
> > (The
> > > > > main
> > > > > > > > reason
> > > > > > > > > > to
> > > > > > > > > > > > not
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > >> that
> > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > >> > > > >> response handling will be a bit
> > > > > > different)
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> The question of what to do
> > during a
> > > > > > > > controller
> > > > > > > > > > > > failover
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > >> > > > interesting
> > > > > > > > > > > > > > > > > >> > > > >> one. Technically, we should be
> > able
> > > > to
> > > > > > > > continue
> > > > > > > > > > > > sending
> > > > > > > > > > > > > > > > > >> incremental
> > > > > > > > > > > > > > > > > >> > > > updates
> > > > > > > > > > > > > > > > > >> > > > >> to the legacy nodes, for the same
> > > > > reason
> > > > > > we
> > > > > > > > can
> > > > > > > > > > in
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > mode.
> > > > > > > > > > > > > > > > > >> > > However,
> > > > > > > > > > > > > > > > > >> > > > >> probably we should just copy
> > what ZK
> > > > > mode
> > > > > > > > does
> > > > > > > > > > and
> > > > > > > > > > > > send
> > > > > > > > > > > > > > > them
> > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > >> full
> > > > > > > > > > > > > > > > > >> > > > >> metadata update. This will allow
> > us
> > > > to
> > > > > > have
> > > > > > > > an
> > > > > > > > > > easy
> > > > > > > > > > > > way
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > handle
> > > > > > > > > > > > > > > > > >> > > > >> divergences caused by lost RPCs
> > by
> > > > > > changing
> > > > > > > > the
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > (just
> > > > > > > > > > > > > > > > > >> as
> > > > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > > > >> > > > do
> > > > > > > > > > > > > > > > > >> > > > >> in ZK mode, unfortunately). I
> > suppose
> > > > > we
> > > > > > > > should
> > > > > > > > > > > > > document
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > > > >> > > > KIP...
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> I agree controlled shutdown is
> > tricky
> > > > > to
> > > > > > > get
> > > > > > > > > just
> > > > > > > > > > > > > right.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > >> suppose
> > > > > > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > > > > > >> > > > is
> > > > > > > > > > > > > > > > > >> > > > >> a case where the RPCs we send
> > out are
> > > > > not
> > > > > > > > > purely
> > > > > > > > > > > > "fire
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > >> forget";
> > > > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > > > >> > > > have
> > > > > > > > > > > > > > > > > >> > > > >> to listen for the response. But
> > that
> > > > > can
> > > > > > be
> > > > > > > > > done
> > > > > > > > > > in
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > >> event-based
> > > > > > > > > > > > > > > > > >> > > way.
> > > > > > > > > > > > > > > > > >> > > > >> Controlled shutdown is also
> > probably
> > > > > the
> > > > > > > last
> > > > > > > > > > thing
> > > > > > > > > > > > > we'll
> > > > > > > > > > > > > > > > > >> implement
> > > > > > > > > > > > > > > > > >> > > > once we
> > > > > > > > > > > > > > > > > >> > > > >> have the basic lift and shift.
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> Topic deletion is another
> > annoying
> > > > > > thing. I
> > > > > > > > > > wonder
> > > > > > > > > > > if
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > >> just
> > > > > > > > > > > > > > > > > >> > > > delete
> > > > > > > > > > > > > > > > > >> > > > >> immediately here and skip the
> > > > > complexity
> > > > > > of
> > > > > > > > > > > > > implementing
> > > > > > > > > > > > > > > > > >> "deleting
> > > > > > > > > > > > > > > > > >> > > > state."
> > > > > > > > > > > > > > > > > >> > > > >> Topic IDs will exist in IBP 3.4,
> > in
> > > > > both
> > > > > > ZK
> > > > > > > > and
> > > > > > > > > > > KRaft
> > > > > > > > > > > > > > mode,
> > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > >> > > > should be
> > > > > > > > > > > > > > > > > >> > > > >> OK, right?
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> best,
> > > > > > > > > > > > > > > > > >> > > > >> Colin
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > > > >> > > > >> > Thanks,
> > > > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > > > >> > > > >> > Jun
> > > > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > > > >> > > > >> > On Fri, Oct 14, 2022 at 9:29 AM
> > > > Colin
> > > > > > > > McCabe
> > > > > > > > > <
> > > > > > > > > > > > > > > > > >> cmcc...@apache.org>
> > > > > > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> On Thu, Oct 13, 2022, at
> > 11:44,
> > > > Jun
> > > > > > Rao
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >> > Hi, Colin,
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > Thanks for the reply.
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > 10. This is a bit on the
> > > > > > > implementation
> > > > > > > > > > side.
> > > > > > > > > > > If
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > > > >> > > > >> existing
> > > > > > > > > > > > > > > > > >> > > > >> >> > ZK-based controller, most
> > of the
> > > > > > logic
> > > > > > > > is
> > > > > > > > > > > around
> > > > > > > > > > > > > > > > > >> maintaining an
> > > > > > > > > > > > > > > > > >> > > > >> in-memory
> > > > > > > > > > > > > > > > > >> > > > >> >> > state of all the resources
> > > > > (broker,
> > > > > > > > topic,
> > > > > > > > > > > > > > partition,
> > > > > > > > > > > > > > > > > etc),
> > > > > > > > > > > > > > > > > >> > > > >> >> reading/writing
> > > > > > > > > > > > > > > > > >> > > > >> >> > to ZK, sending LeaderAndIsr
> > and
> > > > > > > > > > UpdateMetadata
> > > > > > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > >> > > > handling
> > > > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> > responses to brokers. So we
> > need
> > > > > all
> > > > > > > > that
> > > > > > > > > > > logic
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > dual
> > > > > > > > > > > > > > > > > >> > write
> > > > > > > > > > > > > > > > > >> > > > >> mode.
> > > > > > > > > > > > > > > > > >> > > > >> >> One
> > > > > > > > > > > > > > > > > >> > > > >> >> > option is to duplicate all
> > that
> > > > > > logic
> > > > > > > in
> > > > > > > > > > some
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > code.
> > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > >> > can
> > > > > > > > > > > > > > > > > >> > > > be a
> > > > > > > > > > > > > > > > > >> > > > >> bit
> > > > > > > > > > > > > > > > > >> > > > >> >> > error prone and makes the
> > code a
> > > > > bit
> > > > > > > > > harder
> > > > > > > > > > to
> > > > > > > > > > > > > > > maintain
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > > >> > > need
> > > > > > > > > > > > > > > > > >> > > > to
> > > > > > > > > > > > > > > > > >> > > > >> fix
> > > > > > > > > > > > > > > > > >> > > > >> >> > some critical issues in
> > ZK-based
> > > > > > > > > > controllers.
> > > > > > > > > > > > > > Another
> > > > > > > > > > > > > > > > > >> option is
> > > > > > > > > > > > > > > > > >> > > to
> > > > > > > > > > > > > > > > > >> > > > try
> > > > > > > > > > > > > > > > > >> > > > >> >> > reusing the existing code
> > in the
> > > > > > > > ZK-based
> > > > > > > > > > > > > > controller.
> > > > > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > > >> > > example,
> > > > > > > > > > > > > > > > > >> > > > we
> > > > > > > > > > > > > > > > > >> > > > >> >> could
> > > > > > > > > > > > > > > > > >> > > > >> >> > start the EventManager in
> > the
> > > > > > ZK-based
> > > > > > > > > > > > controller,
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > let
> > > > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > > > >> > > > KRaft
> > > > > > > > > > > > > > > > > >> > > > >> >> > controller ingest new
> > events.
> > > > This
> > > > > > has
> > > > > > > > its
> > > > > > > > > > own
> > > > > > > > > > > > > > > > challenges:
> > > > > > > > > > > > > > > > > >> (1)
> > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > >> > > > >> >> existing
> > > > > > > > > > > > > > > > > >> > > > >> >> > logic only logs ZK failures
> > and
> > > > > > > doesn't
> > > > > > > > > > expose
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > caller;
> > > > > > > > > > > > > > > > > >> > > > (2)
> > > > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> > existing logic may add new
> > > > events
> > > > > to
> > > > > > > the
> > > > > > > > > > queue
> > > > > > > > > > > > > > itself
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > >> > > > probably
> > > > > > > > > > > > > > > > > >> > > > >> >> need
> > > > > > > > > > > > > > > > > >> > > > >> >> > to think through how this is
> > > > > > > coordinated
> > > > > > > > > > with
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> > > controller;
> > > > > > > > > > > > > > > > > >> > > > >> (3)
> > > > > > > > > > > > > > > > > >> > > > >> >> it
> > > > > > > > > > > > > > > > > >> > > > >> >> > registers some ZK listeners
> > > > > > > > unnecessarily
> > > > > > > > > > (may
> > > > > > > > > > > > not
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > big
> > > > > > > > > > > > > > > > > >> > > > concern).
> > > > > > > > > > > > > > > > > >> > > > >> So
> > > > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > > > >> > > > >> >> > need to get around those
> > issues
> > > > > > > > somehow. I
> > > > > > > > > > am
> > > > > > > > > > > > > > > wondering
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > >> we
> > > > > > > > > > > > > > > > > >> > > have
> > > > > > > > > > > > > > > > > >> > > > >> >> > considered both options and
> > > > which
> > > > > > > > approach
> > > > > > > > > > we
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > > leaning
> > > > > > > > > > > > > > > > > >> > towards
> > > > > > > > > > > > > > > > > >> > > > for
> > > > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> > implementation.
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> Yes, this is a good question.
> > My
> > > > > take
> > > > > > is
> > > > > > > > > that
> > > > > > > > > > a
> > > > > > > > > > > > big
> > > > > > > > > > > > > > part
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > >> the
> > > > > > > > > > > > > > > > > >> > > > >> complexity
> > > > > > > > > > > > > > > > > >> > > > >> >> in the old controller code
> > results
> > > > > > from
> > > > > > > > the
> > > > > > > > > > fact
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > >> ZK
> > > > > > > > > > > > > > > > > >> > as
> > > > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > > > >> > > > >> >> multi-writer database for
> > > > > propagating
> > > > > > > > > > > information
> > > > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > >> > different
> > > > > > > > > > > > > > > > > >> > > > >> >> components. So in the old
> > > > > controller,
> > > > > > > > every
> > > > > > > > > > > write
> > > > > > > > > > > > to
> > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > needs
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> > be
> > > > > > > > > > > > > > > > > >> > > > >> >> structured as a compare and
> > swap
> > > > to
> > > > > be
> > > > > > > > fully
> > > > > > > > > > > > > correct.
> > > > > > > > > > > > > > > > Every
> > > > > > > > > > > > > > > > > >> time
> > > > > > > > > > > > > > > > > >> > we
> > > > > > > > > > > > > > > > > >> > > > get
> > > > > > > > > > > > > > > > > >> > > > >> >> notified about something, it's
> > > > > usually
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > form
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > "this
> > > > > > > > > > > > > > > > > >> znode
> > > > > > > > > > > > > > > > > >> > > > >> changed"
> > > > > > > > > > > > > > > > > >> > > > >> >> which prompts a full reload of
> > > > part
> > > > > of
> > > > > > > the
> > > > > > > > > > data
> > > > > > > > > > > in
> > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > (which
> > > > > > > > > > > > > > > > > >> > itself
> > > > > > > > > > > > > > > > > >> > > > has
> > > > > > > > > > > > > > > > > >> > > > >> >> multiple parts, loading,
> > > > > > deserializing,
> > > > > > > > > > > > reconciling,
> > > > > > > > > > > > > > > etc.)
> > > > > > > > > > > > > > > > > >> That
> > > > > > > > > > > > > > > > > >> > all
> > > > > > > > > > > > > > > > > >> > > > goes
> > > > > > > > > > > > > > > > > >> > > > >> >> away in the new mode, and we
> > just
> > > > > have
> > > > > > > > some
> > > > > > > > > > code
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > > >> analyzes
> > > > > > > > > > > > > > > > > >> > > > >> >> __cluster_metadata and
> > reflects it
> > > > > in
> > > > > > 1)
> > > > > > > > > > updates
> > > > > > > > > > > > to
> > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > 2)
> > > > > > > > > > > > > > > > > >> > > > messages
> > > > > > > > > > > > > > > > > >> > > > >> sent
> > > > > > > > > > > > > > > > > >> > > > >> >> out to brokers.
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> This is pretty decoupled from
> > the
> > > > > > other
> > > > > > > > > logic
> > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> QuorumController
> > > > > > > > > > > > > > > > > >> > > and
> > > > > > > > > > > > > > > > > >> > > > >> >> should be easy to unit test,
> > since
> > > > > the
> > > > > > > > same
> > > > > > > > > > > inputs
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> log
> > > > > > > > > > > > > > > > > >> > > > always
> > > > > > > > > > > > > > > > > >> > > > >> >> produce the same output in ZK.
> > > > > > > Basically,
> > > > > > > > ZK
> > > > > > > > > > is
> > > > > > > > > > > > > > > write-only
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > >> > us,
> > > > > > > > > > > > > > > > > >> > > > we do
> > > > > > > > > > > > > > > > > >> > > > >> >> not read it (with the big
> > > > exception
> > > > > of
> > > > > > > > > broker
> > > > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > > > >> > znodes)
> > > > > > > > > > > > > > > > > >> > > > and I
> > > > > > > > > > > > > > > > > >> > > > >> >> think that will greatly
> > simplify
> > > > > > things.
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> So I think dual-write mode as
> > > > > > described
> > > > > > > > here
> > > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > >> > substantially
> > > > > > > > > > > > > > > > > >> > > > >> simpler
> > > > > > > > > > > > > > > > > >> > > > >> >> than trying to run part or
> > all of
> > > > > the
> > > > > > > old
> > > > > > > > > > > > controller
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> > parallel. I
> > > > > > > > > > > > > > > > > >> > > > do
> > > > > > > > > > > > > > > > > >> > > > >> >> think we will reuse a bunch
> > of the
> > > > > > > > > > > serialization /
> > > > > > > > > > > > > > > > > >> > deserialization
> > > > > > > > > > > > > > > > > >> > > > code
> > > > > > > > > > > > > > > > > >> > > > >> for
> > > > > > > > > > > > > > > > > >> > > > >> >> znodes and possibly the code
> > for
> > > > > > > > > communicating
> > > > > > > > > > > > with
> > > > > > > > > > > > > > ZK.
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> best,
> > > > > > > > > > > > > > > > > >> > > > >> >> Colin
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > 14. Good point and make
> > sense.
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > Thanks,
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > Jun
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> > On Wed, Oct 12, 2022 at
> > 3:27 PM
> > > > > > Colin
> > > > > > > > > > McCabe <
> > > > > > > > > > > > > > > > > >> > cmcc...@apache.org
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Hi Jun,
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Thanks for taking a look.
> > I can
> > > > > > > answer
> > > > > > > > > some
> > > > > > > > > > > > > > questions
> > > > > > > > > > > > > > > > > here
> > > > > > > > > > > > > > > > > >> > > > because I
> > > > > > > > > > > > > > > > > >> > > > >> >> >> collaborated on this a
> > bit, and
> > > > > > David
> > > > > > > > is
> > > > > > > > > on
> > > > > > > > > > > > > > vacation
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > >> few
> > > > > > > > > > > > > > > > > >> > > > days.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> On Wed, Oct 12, 2022, at
> > 14:41,
> > > > > Jun
> > > > > > > Rao
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > Hi, David,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks for the KIP. A few
> > > > > > comments
> > > > > > > > > below.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 10. It's still not very
> > clear
> > > > > to
> > > > > > me
> > > > > > > > how
> > > > > > > > > > the
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > > > >> > > > works
> > > > > > > > > > > > > > > > > >> > > > >> in
> > > > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > dual writes mode to
> > KRaft log
> > > > > and
> > > > > > > ZK
> > > > > > > > > when
> > > > > > > > > > > the
> > > > > > > > > > > > > > > brokers
> > > > > > > > > > > > > > > > > >> still
> > > > > > > > > > > > > > > > > >> > > run
> > > > > > > > > > > > > > > > > >> > > > in
> > > > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > > > >> > > > >> >> >> mode.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > Does the KRaft controller
> > > > run a
> > > > > > ZK
> > > > > > > > > based
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> > > parallel
> > > > > > > > > > > > > > > > > >> > > > or
> > > > > > > > > > > > > > > > > >> > > > >> do
> > > > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > derive what needs to be
> > > > written
> > > > > > to
> > > > > > > ZK
> > > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > > > >> > > > >> logic?
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> We derive what needs to be
> > > > > written
> > > > > > to
> > > > > > > > ZK
> > > > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > > > >> > > > >> >> logic.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > I am also not sure how
> > the
> > > > > KRaft
> > > > > > > > > > controller
> > > > > > > > > > > > > > handles
> > > > > > > > > > > > > > > > > >> broker
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > registration/deregistration,
> > > > > > since
> > > > > > > > > > brokers
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > > >> running
> > > > > > > > > > > > > > > > > >> > > in
> > > > > > > > > > > > > > > > > >> > > > ZK
> > > > > > > > > > > > > > > > > >> > > > >> >> mode
> > > > > > > > > > > > > > > > > >> > > > >> >> >> and
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > are not heartbeating to
> > the
> > > > > KRaft
> > > > > > > > > > > controller.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> The new controller will
> > listen
> > > > > for
> > > > > > > > broker
> > > > > > > > > > > > > > > registrations
> > > > > > > > > > > > > > > > > >> under
> > > > > > > > > > > > > > > > > >> > > > >> /brokers.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> This is the only znode
> > watch
> > > > that
> > > > > > the
> > > > > > > > new
> > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > >> do.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> We did consider changing
> > how
> > > > > > ZK-based
> > > > > > > > > > broker
> > > > > > > > > > > > > > > > registration
> > > > > > > > > > > > > > > > > >> > > worked,
> > > > > > > > > > > > > > > > > >> > > > >> but it
> > > > > > > > > > > > > > > > > >> > > > >> >> >> just ended up being too
> > much
> > > > work
> > > > > > for
> > > > > > > > not
> > > > > > > > > > > > enough
> > > > > > > > > > > > > > > gain.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 12. "A new set of nodes
> > will
> > > > be
> > > > > > > > > > provisioned
> > > > > > > > > > > > to
> > > > > > > > > > > > > > host
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > > controller
> > > > > > > > > > > > > > > > > >> > > > >> >> >> quorum."
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > I guess we don't support
> > > > > starting
> > > > > > > the
> > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > >> > quorum
> > > > > > > > > > > > > > > > > >> > > on
> > > > > > > > > > > > > > > > > >> > > > >> >> existing
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > brokers. It would be
> > useful
> > > > to
> > > > > > make
> > > > > > > > > that
> > > > > > > > > > > > clear.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Agreed
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 13. "Once the quorum is
> > > > > > established
> > > > > > > > > and a
> > > > > > > > > > > > > leader
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > >> elected,
> > > > > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> controller
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > will check the state of
> > the
> > > > > > cluster
> > > > > > > > > using
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> MigrationCheck
> > > > > > > > > > > > > > > > > >> > > > RPC."
> > > > > > > > > > > > > > > > > >> > > > >> How
> > > > > > > > > > > > > > > > > >> > > > >> >> >> does
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > the quorum controller
> > detect
> > > > > > other
> > > > > > > > > > brokers?
> > > > > > > > > > > > > Does
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > controller
> > > > > > > > > > > > > > > > > >> > > > >> node
> > > > > > > > > > > > > > > > > >> > > > >> >> need
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > to be configured with ZK
> > > > > > connection
> > > > > > > > > > string?
> > > > > > > > > > > > If
> > > > > > > > > > > > > > so,
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > >> would
> > > > > > > > > > > > > > > > > >> > be
> > > > > > > > > > > > > > > > > >> > > > >> useful
> > > > > > > > > > > > > > > > > >> > > > >> >> to
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > document the additional
> > > > configs
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > > > > quorum
> > > > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > > > >> > > > needs to
> > > > > > > > > > > > > > > > > >> > > > >> >> set.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Yes, the controllers
> > monitor ZK
> > > > > for
> > > > > > > > > broker
> > > > > > > > > > > > > > > > registrations,
> > > > > > > > > > > > > > > > > >> as I
> > > > > > > > > > > > > > > > > >> > > > >> mentioned
> > > > > > > > > > > > > > > > > >> > > > >> >> >> above. So they need
> > zk.connect
> > > > > and
> > > > > > > the
> > > > > > > > > > other
> > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > connection
> > > > > > > > > > > > > > > > > >> > > > >> >> configurations.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 14. "In order to prevent
> > > > > further
> > > > > > > > writes
> > > > > > > > > > to
> > > > > > > > > > > > ZK,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > >> > thing
> > > > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > > > >> > > > >> new
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft quorum must do is
> > take
> > > > > over
> > > > > > > > > > > leadership
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > > >> > > > controller.
> > > > > > > > > > > > > > > > > >> > > > >> "
> > > > > > > > > > > > > > > > > >> > > > >> >> The
> > > > > > > > > > > > > > > > > >> > > > >> >> >> ZK
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > controller processing
> > changes
> > > > > to
> > > > > > > > > > > /controller
> > > > > > > > > > > > > > update
> > > > > > > > > > > > > > > > > >> > > > asynchronously.
> > > > > > > > > > > > > > > > > >> > > > >> >> How
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > does the KRaft controller
> > > > know
> > > > > > when
> > > > > > > > the
> > > > > > > > > > ZK
> > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > >> > > > resigned
> > > > > > > > > > > > > > > > > >> > > > >> >> before
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > it can safely copy the ZK
> > > > data?
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> This should be done through
> > > > > > > > > > > > > > > > > >> expectedControllerEpochZkVersion,
> > > > > > > > > > > > > > > > > >> > > just
> > > > > > > > > > > > > > > > > >> > > > >> like
> > > > > > > > > > > > > > > > > >> > > > >> >> in
> > > > > > > > > > > > > > > > > >> > > > >> >> >> ZK mode, right? We should
> > bump
> > > > > this
> > > > > > > > epoch
> > > > > > > > > > > value
> > > > > > > > > > > > > so
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > > >> > > writes
> > > > > > > > > > > > > > > > > >> > > > >> from
> > > > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> old controller will not go
> > > > > > through. I
> > > > > > > > > agree
> > > > > > > > > > > we
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > spell
> > > > > > > > > > > > > > > > > >> > this
> > > > > > > > > > > > > > > > > >> > > > out
> > > > > > > > > > > > > > > > > >> > > > >> in
> > > > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> KIP.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 15. We have the following
> > > > > > > sentences.
> > > > > > > > > One
> > > > > > > > > > > says
> > > > > > > > > > > > > > > > > >> ControllerId
> > > > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > > > >> > > > >> random
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > KRaft broker and the
> > other
> > > > says
> > > > > > > it's
> > > > > > > > > the
> > > > > > > > > > > > active
> > > > > > > > > > > > > > > > > >> controller.
> > > > > > > > > > > > > > > > > >> > > > Which
> > > > > > > > > > > > > > > > > >> > > > >> one
> > > > > > > > > > > > > > > > > >> > > > >> >> is
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > correct?
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > "UpdateMetadata: for
> > certain
> > > > > > > metadata
> > > > > > > > > > > > changes,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> > > > controller
> > > > > > > > > > > > > > > > > >> > > > >> >> will
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > need to send
> > > > > > UpdateMetadataRequests
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > ZK
> > > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > > >> For
> > > > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > “ControllerId” field in
> > this
> > > > > > > request,
> > > > > > > > > the
> > > > > > > > > > > > > > > controller
> > > > > > > > > > > > > > > > > >> should
> > > > > > > > > > > > > > > > > >> > > > >> specify a
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > random KRaft broker."
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > "In the
> > UpdateMetadataRequest
> > > > > > sent
> > > > > > > by
> > > > > > > > > the
> > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > > > > >> controller
> > > > > > > > > > > > > > > > > >> > to
> > > > > > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > brokers, the ControllerId
> > > > will
> > > > > > > point
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > > active
> > > > > > > > > > > > > > > > > >> > controller
> > > > > > > > > > > > > > > > > >> > > > which
> > > > > > > > > > > > > > > > > >> > > > >> >> will
> > > > > > > > > > > > > > > > > >> > > > >> >> >> be
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > used for the inter-broker
> > > > > > > requests."
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, this seems like an
> > error
> > > > to
> > > > > > me
> > > > > > > as
> > > > > > > > > > > well. A
> > > > > > > > > > > > > > > random
> > > > > > > > > > > > > > > > > >> value
> > > > > > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > > > > > >> > > > not
> > > > > > > > > > > > > > > > > >> > > > >> >> really
> > > > > > > > > > > > > > > > > >> > > > >> >> >> useful. Plus the text here
> > is
> > > > > > > > > > > > self-contradictory,
> > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > >> > pointed
> > > > > > > > > > > > > > > > > >> > > > out.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> I suspect what we should do
> > > > here
> > > > > is
> > > > > > > > add a
> > > > > > > > > > new
> > > > > > > > > > > > > > field,
> > > > > > > > > > > > > > > > > >> > > > >> KRaftControllerId,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> and populate it with the
> > real
> > > > > > > > controller
> > > > > > > > > > ID,
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > leave
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> old
> > > > > > > > > > > > > > > > > >> > > > >> >> controllerId
> > > > > > > > > > > > > > > > > >> > > > >> >> >> field as -1. A ZK-based
> > broker
> > > > > that
> > > > > > > > sees
> > > > > > > > > > this
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > >> consult
> > > > > > > > > > > > > > > > > >> > > its
> > > > > > > > > > > > > > > > > >> > > > >> >> >> controller.quorum.voters
> > > > > > > configuration
> > > > > > > > to
> > > > > > > > > > see
> > > > > > > > > > > > > where
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > >> should
> > > > > > > > > > > > > > > > > >> > > send
> > > > > > > > > > > > > > > > > >> > > > >> >> >> controller-bound RPCs. That
> > > > > > (static)
> > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > lets
> > > > > > > > > > > > > > > > > us
> > > > > > > > > > > > > > > > > >> map
> > > > > > > > > > > > > > > > > >> > > > >> between
> > > > > > > > > > > > > > > > > >> > > > >> >> >> controller ID and
> > host:port.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> We should still keep our
> > > > existing
> > > > > > > epoch
> > > > > > > > > > logic
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > deciding
> > > > > > > > > > > > > > > > > >> > when
> > > > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadataRequest /
> > > > > > > > > > LeaderAndIsrRequests
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > stale,
> > > > > > > > > > > > > > > > > >> with
> > > > > > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > > > > > >> > > > >> caveat
> > > > > > > > > > > > > > > > > >> > > > >> >> >> that any kraft-based epoch
> > > > should
> > > > > > be
> > > > > > > > > > treated
> > > > > > > > > > > as
> > > > > > > > > > > > > > > greater
> > > > > > > > > > > > > > > > > >> than
> > > > > > > > > > > > > > > > > >> > any
> > > > > > > > > > > > > > > > > >> > > > >> >> ZK-based
> > > > > > > > > > > > > > > > > >> > > > >> >> >> epoch. After all, the kraft
> > > > epoch
> > > > > > is
> > > > > > > > > coming
> > > > > > > > > > > > from
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > epoch
> > > > > > > > > > > > > > > > > >> of
> > > > > > > > > > > > > > > > > >> > > > >> >> >> __cluster_metadata,
> > whereas the
> > > > > ZK
> > > > > > > > epoch
> > > > > > > > > > > comes
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > ZK.
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > 16. "Additionally, the
> > > > > controller
> > > > > > > > must
> > > > > > > > > > > > specify
> > > > > > > > > > > > > > if a
> > > > > > > > > > > > > > > > > >> broker
> > > > > > > > > > > > > > > > > >> > in
> > > > > > > > > > > > > > > > > >> > > > >> >> >> “LiveBrokers”
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > is KRaft or ZK." Does
> > that
> > > > > > require
> > > > > > > > any
> > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> > > > >> >> >> UpdateMetadata?
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Yeah, I am also curious
> > why the
> > > > > we
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > care
> > > > > > > > > > > > > > > whether
> > > > > > > > > > > > > > > > > >> > brokers
> > > > > > > > > > > > > > > > > >> > > > are
> > > > > > > > > > > > > > > > > >> > > > >> ZK
> > > > > > > > > > > > > > > > > >> > > > >> >> or
> > > > > > > > > > > > > > > > > >> > > > >> >> >> KRaft in
> > UpdateMetadataRequest.
> > > > > We
> > > > > > > > don't
> > > > > > > > > > > reveal
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > >> > clients,
> > > > > > > > > > > > > > > > > >> > > > so
> > > > > > > > > > > > > > > > > >> > > > >> can
> > > > > > > > > > > > > > > > > >> > > > >> >> we
> > > > > > > > > > > > > > > > > >> > > > >> >> >> just leave this out?
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> best,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> Colin
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > Thanks,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > Jun
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > On Wed, Oct 5, 2022 at
> > 10:07
> > > > AM
> > > > > > > > Mickael
> > > > > > > > > > > > Maison
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > >> > > > >> >> mickael.mai...@gmail.com
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> > wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Hi David,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks for starting this
> > > > > > important
> > > > > > > > > KIP.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> I've just taken a quick
> > look
> > > > > so
> > > > > > > far
> > > > > > > > > but
> > > > > > > > > > > I've
> > > > > > > > > > > > > > got a
> > > > > > > > > > > > > > > > > >> couple
> > > > > > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > > > > > >> > > > >> initial
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> questions:
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> 1) What happens if a non
> > > > KRaft
> > > > > > > > > > compatible
> > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > (or
> > > > > > > > > > > > > > > > > >> with
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > kafka.metadata.migration.enable
> > > > > > > set
> > > > > > > > to
> > > > > > > > > > > > false)
> > > > > > > > > > > > > > > joins
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > cluster
> > > > > > > > > > > > > > > > > >> > > > >> after
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> the migration is
> > triggered?
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> 2) In the Failure Modes
> > > > > section
> > > > > > > you
> > > > > > > > > > > mention
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > > > >> > where
> > > > > > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > > > > > >> > > > >> write
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> to ZK fails. What
> > happens
> > > > when
> > > > > > the
> > > > > > > > > > > > divergence
> > > > > > > > > > > > > > > limit
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > >> > > > reached? Is
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> this a fatal condition?
> > How
> > > > > much
> > > > > > > > > > > divergence
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > >> > allow?
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Thanks,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> Mickael
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> On Wed, Oct 5, 2022 at
> > 12:20
> > > > > AM
> > > > > > > > David
> > > > > > > > > > > > Arthur <
> > > > > > > > > > > > > > > > > >> > > mum...@gmail.com
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > >> >> wrote:
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > Hey folks, I wanted
> > to get
> > > > > the
> > > > > > > > ball
> > > > > > > > > > > > rolling
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >> > > discussion
> > > > > > > > > > > > > > > > > >> > > > >> for
> > > > > > > > > > > > > > > > > >> > > > >> >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > ZooKeeper migration
> > KIP.
> > > > > This
> > > > > > > KIP
> > > > > > > > > > > details
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > >> to
> > > > > > > > > > > > > > > > > >> > do
> > > > > > > > > > > > > > > > > >> > > > an
> > > > > > > > > > > > > > > > > >> > > > >> >> online
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration of metadata
> > from
> > > > > > > > ZooKeeper
> > > > > > > > > > to
> > > > > > > > > > > > > KRaft
> > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > well
> > > > > > > > > > > > > > > > > >> as
> > > > > > > > > > > > > > > > > >> > a
> > > > > > > > > > > > > > > > > >> > > > >> rolling
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > upgrade of brokers to
> > > > KRaft
> > > > > > > mode.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > The general idea is to
> > > > keep
> > > > > > > KRaft
> > > > > > > > > and
> > > > > > > > > > > > > > ZooKeeper
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > >> sync
> > > > > > > > > > > > > > > > > >> > > > during
> > > > > > > > > > > > > > > > > >> > > > >> the
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > migration, so both
> > types
> > > > of
> > > > > > > > brokers
> > > > > > > > > > can
> > > > > > > > > > > > > exist
> > > > > > > > > > > > > > > > > >> > > simultaneously.
> > > > > > > > > > > > > > > > > >> > > > >> Then,
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > once everything is
> > > > migrated
> > > > > > and
> > > > > > > > > > updated,
> > > > > > > > > > > > we
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > turn
> > > > > > > > > > > > > > > > > >> off
> > > > > > > > > > > > > > > > > >> > > > >> ZooKeeper
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > writes.
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > This is a pretty
> > complex
> > > > > KIP,
> > > > > > so
> > > > > > > > > > please
> > > > > > > > > > > > > take a
> > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> >
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > Thanks!
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >> > David
> > > > > > > > > > > > > > > > > >> > > > >> >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >> >>
> > > > > > > > > > > > > > > > > >> > > > >> >>
> > > > > > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > > > > >> > > > > --
> > > > > > > > > > > > > > > > > >> > > > > -David
> > > > > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> > > --
> > > > > > > > > > > > > > > > > >> > > -David
> > > > > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > -David
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > -David
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > -David
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > -David
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -David
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -David
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -David
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> >
> >
> >
> > --
> > David Arthur
> >



-- 
David Arthur

Reply via email to