Hi all,

I wanted to let you know that I have made the following small change to the
`kafka-features` CLI tool description in the KIP-584 write up. The purpose
is to ensure the design is compatible with post KIP-500 world. I have
eliminated the facility in Admin#describeFeatures API to be able to
optionally send a describeFeatures request to the controller. This facility
was originally seen useful for (1) debugability and (2) slightly better
consistency guarantees in the CLI tool that reads features before updating
them. But in hindsight it now poses a hindrance to post KIP-500 world where
no client would be able to access the controller directly. So, when looking
at cost vs benefit, this facility does not feel useful enough and therefore
I've removed it. We can discuss if it becomes necessary in the future, and
implement a suitable solution.

The corresponding PR containing this change is
https://github.com/apache/kafka/pull/9536 .

Please let me know if you have any questions or concerns.


Cheers,
Kowshik


On Thu, Oct 15, 2020 at 10:17 AM Jun Rao <j...@confluent.io> wrote:

> Hi, Kowshik,
>
> Thanks for the update. Those changes look good to me.
>
> Jun
>
> On Tue, Oct 13, 2020 at 4:50 PM Kowshik Prakasam <kpraka...@confluent.io>
> wrote:
>
> > Hi all,
> >
> > I wanted to let you know that I have made the following minor changes to
> > the `kafka-features` CLI tool description in the KIP-584 write up. The
> > purpose is to ensure the design is correct for a few things which came up
> > during implementation:
> >
> > 1. The CLI tool now produces a tab-formatted output instead of JSON. This
> > aligns with the type of format produced by other admin CLI tools of
> Kafka,
> > ex: `kafka-topics`.
> > 2. Whenever feature updates are performed, the output of the CLI tool
> shows
> > the result of each feature update that was applied.
> > 3. The CLI tool accepts an optional argument `--dry-run` which lets the
> > user preview the feature updates before applying them.
> >
> > The following section of the KIP has been updated with the above changes:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Toolingsupport
> >
> > Please let me know if you have any questions.
> >
> >
> > Cheers,
> > Kowshik
> >
> >
> > On Thu, Oct 8, 2020 at 1:12 AM Kowshik Prakasam <kpraka...@confluent.io>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > This is a very good point. I have updated the feature version
> deprecation
> > > section mentioning the same:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation
> > > .
> > >
> > > Thank you for the suggestion.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > > On Tue, Oct 6, 2020 at 5:30 PM Jun Rao <j...@confluent.io> wrote:
> > >
> > >> Hi, Kowshik,
> > >>
> > >> Thanks for the follow up. Both look good to me.
> > >>
> > >> For 2, it would be useful to also add that an admin should make sure
> > that
> > >> no clients are using a deprecated feature version (e.g. using the
> client
> > >> version metric) before deploying a release that deprecates it.
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Oct 6, 2020 at 3:46 PM Kowshik Prakasam <
> kpraka...@confluent.io
> > >
> > >> wrote:
> > >>
> > >> > Hi Jun,
> > >> >
> > >> > I have added the following details in the KIP-584 write up:
> > >> >
> > >> > 1. Deployment, IBP deprecation and avoidance of double rolls. This
> > >> section
> > >> > talks about the various phases of work that would be required to use
> > >> this
> > >> > KIP to eventually avoid Broker double rolls in the cluster (whenever
> > IBP
> > >> > values are advanced). Link to section:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Deployment,IBPdeprecationandavoidanceofdoublerolls
> > >> > .
> > >> >
> > >> > 2. Feature version deprecation. This section explains the idea for
> > >> feature
> > >> > version deprecation (using highest supported feature min version)
> > which
> > >> you
> > >> > had proposed during code review:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-Featureversiondeprecation
> > >> > .
> > >> >
> > >> > Please let me know if you have any questions.
> > >> >
> > >> >
> > >> > Cheers,
> > >> > Kowshik
> > >> >
> > >> >
> > >> > On Tue, Sep 29, 2020 at 11:07 AM Jun Rao <j...@confluent.io> wrote:
> > >> >
> > >> > > Hi, Kowshik,
> > >> > >
> > >> > > Thanks for the update. Regarding enabling a single rolling restart
> > in
> > >> the
> > >> > > future, could we sketch out a bit how this will work by treating
> IBP
> > >> as a
> > >> > > feature? For example, IBP currently uses the release version and
> > this
> > >> KIP
> > >> > > uses an integer for versions. How do we bridge the gap between the
> > >> two?
> > >> > > Does min.version still make sense for IBP as a feature?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jun
> > >> > >
> > >> > > On Fri, Sep 25, 2020 at 5:57 PM Kowshik Prakasam <
> > >> kpraka...@confluent.io
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Colin,
> > >> > > >
> > >> > > > Thanks for the feedback. Those are very good points. I have made
> > the
> > >> > > > following changes to the KIP as you had suggested:
> > >> > > > 1. Included the `timeoutMs` field in the `UpdateFeaturesRequest`
> > >> > schema.
> > >> > > > The initial implementation won't be making use of the field, but
> > we
> > >> can
> > >> > > > always use it in the future as the need arises.
> > >> > > > 2. Modified the `FinalizedFeaturesEpoch` field in
> > >> `ApiVersionsResponse`
> > >> > > to
> > >> > > > use int64. This is to avoid overflow problems in the future once
> > ZK
> > >> is
> > >> > > > gone.
> > >> > > >
> > >> > > > I have also incorporated these changes into the versioning write
> > >> path
> > >> > PR
> > >> > > > that is currently under review:
> > >> > > https://github.com/apache/kafka/pull/9001.
> > >> > > >
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Kowshik
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Sep 25, 2020 at 4:57 PM Kowshik Prakasam <
> > >> > kpraka...@confluent.io
> > >> > > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Jun,
> > >> > > > >
> > >> > > > > Thanks for the feedback. It's a very good point. I have now
> > >> modified
> > >> > > the
> > >> > > > > KIP-584 write-up "goals" section a bit. It now mentions one of
> > the
> > >> > > goals
> > >> > > > as
> > >> > > > > enabling rolling upgrades using a single restart (instead of
> 2).
> > >> > Also I
> > >> > > > > have removed the text explicitly aiming for deprecation of
> IBP.
> > >> Note
> > >> > > that
> > >> > > > > previously under "Potential features in Kafka" the IBP was
> > >> mentioned
> > >> > > > under
> > >> > > > > point (4) as a possible coarse-grained feature. Hopefully, now
> > >> the 2
> > >> > > > > sections of the KIP align with each other well.
> > >> > > > >
> > >> > > > >
> > >> > > > > Cheers,
> > >> > > > > Kowshik
> > >> > > > >
> > >> > > > >
> > >> > > > > On Fri, Sep 25, 2020 at 2:03 PM Colin McCabe <
> > cmcc...@apache.org>
> > >> > > wrote:
> > >> > > > >
> > >> > > > >> On Tue, Sep 22, 2020, at 00:43, Kowshik Prakasam wrote:
> > >> > > > >> > Hi all,
> > >> > > > >> >
> > >> > > > >> > I wanted to let you know that I have made the following
> > >> changes to
> > >> > > the
> > >> > > > >> > KIP-584 write up. The purpose is to ensure the design is
> > >> correct
> > >> > > for a
> > >> > > > >> few
> > >> > > > >> > things which came up during implementation:
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >> Hi Kowshik,
> > >> > > > >>
> > >> > > > >> Thanks for the updates.
> > >> > > > >>
> > >> > > > >> >
> > >> > > > >> > 1. Per FeatureUpdate error code: The UPDATE_FEATURES
> > controller
> > >> > API
> > >> > > is
> > >> > > > >> no
> > >> > > > >> > longer transactional. Going forward, we allow for
> individual
> > >> > > > >> FeatureUpdate
> > >> > > > >> > to succeed/fail in the request. As a result, the response
> > >> schema
> > >> > now
> > >> > > > >> > contains an error code per FeatureUpdate as well as a
> > top-level
> > >> > > error
> > >> > > > >> code.
> > >> > > > >> > Overall this is a better design because it better
> represents
> > >> the
> > >> > > > nature
> > >> > > > >> of
> > >> > > > >> > the API: each FeatureUpdate in the request is independent
> of
> > >> the
> > >> > > other
> > >> > > > >> > updates, and the controller can process/apply these
> > >> independently
> > >> > to
> > >> > > > ZK.
> > >> > > > >> > When an UPDATE_FEATURES request fails, this new design
> > provides
> > >> > > better
> > >> > > > >> > clarity to the caller on which FeatureUpdate could not be
> > >> applied
> > >> > > (via
> > >> > > > >> the
> > >> > > > >> > individual error codes). In the previous design, we were
> > >> unable to
> > >> > > > >> achieve
> > >> > > > >> > such an increased level of clarity in communicating the
> error
> > >> > codes.
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >> OK
> > >> > > > >>
> > >> > > > >> >
> > >> > > > >> > 2. Due to #1, there were some minor changes required to the
> > >> > proposed
> > >> > > > >> Admin
> > >> > > > >> > APIs (describeFeatures and updateFeatures). A few
> unnecessary
> > >> > public
> > >> > > > >> APIs
> > >> > > > >> > have been removed, and couple essential ones have been
> added.
> > >> The
> > >> > > > latest
> > >> > > > >> > changes now represent the latest design.
> > >> > > > >> >
> > >> > > > >> > 3. The timeoutMs field has been removed from the the
> > >> > UPDATE_FEATURES
> > >> > > > API
> > >> > > > >> > request, since it was not found to be required during
> > >> > > implementation.
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >> Please don't get rid of timeoutMs.  timeoutMs is required if
> > you
> > >> > want
> > >> > > to
> > >> > > > >> implement the ability to timeout the call if the controller
> > can't
> > >> > get
> > >> > > > to it
> > >> > > > >> in time.  This is important for avoiding congestion collapse
> > >> where
> > >> > the
> > >> > > > >> controller collapses under the weight of lots of retries of
> the
> > >> same
> > >> > > > set of
> > >> > > > >> calls.
> > >> > > > >>
> > >> > > > >> We may not be able to do it in the initial implementation,
> but
> > we
> > >> > will
> > >> > > > >> eventually implement this for all the controller-bound RPCs.
> > >> > > > >>
> > >> > > > >> > >
> > >> > > > >> > > 2. Finalized feature version epoch data type has been
> made
> > >> to be
> > >> > > > int32
> > >> > > > >> > > (instead of int64). The reason is that the epoch value is
> > the
> > >> > > value
> > >> > > > >> of ZK
> > >> > > > >> > > node version, whose data type is int32.
> > >> > > > >> > >
> > >> > > > >>
> > >> > > > >> Sorry, I missed this earlier.  Using 16 bit feature levels
> > seems
> > >> > fine.
> > >> > > > >> However, please don't use a 32-bit epoch here.  We
> deliberately
> > >> made
> > >> > > the
> > >> > > > >> epoch 64 bits to avoid overflow problems in the future once
> ZK
> > is
> > >> > > gone.
> > >> > > > >>
> > >> > > > >> best,
> > >> > > > >> Colin
> > >> > > > >>
> > >> > > > >> > > 3. Introduced a new 'status' field in the '/features' ZK
> > node
> > >> > > > schema.
> > >> > > > >> The
> > >> > > > >> > > purpose is to implement Colin's earlier point for the
> > >> strategy
> > >> > for
> > >> > > > >> > > transitioning from not having a /features znode to having
> > >> one.
> > >> > An
> > >> > > > >> > > explanation has been provided in the following section of
> > the
> > >> > KIP
> > >> > > > >> detailing
> > >> > > > >> > > the different cases:
> > >> > > > >> > >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus
> > >> > > > >> > > .
> > >> > > > >> > >
> > >> > > > >> > > Please let me know if you have any questions or concerns.
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > > Cheers,
> > >> > > > >> > > Kowshik
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > >
> > >> > > > >> > > Cheers,
> > >> > > > >> > > Kowshik
> > >> > > > >> > >
> > >> > > > >> > > On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam <
> > >> > > > >> kpraka...@confluent.io>
> > >> > > > >> > > wrote:
> > >> > > > >> > >
> > >> > > > >> > >> Hi all,
> > >> > > > >> > >>
> > >> > > > >> > >> This KIP vote has been open for ~12 days. The summary of
> > the
> > >> > > votes
> > >> > > > is
> > >> > > > >> > >> that we have 3 binding votes (Colin, Guozhang, Jun),
> and 3
> > >> > > > >> non-binding
> > >> > > > >> > >> votes (David, Dhruvil, Boyang). Therefore, the KIP vote
> > >> passes.
> > >> > > > I'll
> > >> > > > >> mark
> > >> > > > >> > >> KIP as accepted and start working on the implementation.
> > >> > > > >> > >>
> > >> > > > >> > >> Thanks a lot!
> > >> > > > >> > >>
> > >> > > > >> > >>
> > >> > > > >> > >> Cheers,
> > >> > > > >> > >> Kowshik
> > >> > > > >> > >>
> > >> > > > >> > >> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe <
> > >> > > cmcc...@apache.org>
> > >> > > > >> wrote:
> > >> > > > >> > >>
> > >> > > > >> > >>> Thanks, Kowshik.  +1 (binding)
> > >> > > > >> > >>>
> > >> > > > >> > >>> best,
> > >> > > > >> > >>> Colin
> > >> > > > >> > >>>
> > >> > > > >> > >>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote:
> > >> > > > >> > >>> > Hi Colin,
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > Thanks for the explanation! I agree with you, and I
> > have
> > >> > > updated
> > >> > > > >> the
> > >> > > > >> > >>> > KIP.
> > >> > > > >> > >>> > Here is a link to relevant section:
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > >> > > > >> > >>> >
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > Cheers,
> > >> > > > >> > >>> > Kowshik
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe <
> > >> > > > cmcc...@apache.org>
> > >> > > > >> > >>> wrote:
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam
> > wrote:
> > >> > > > >> > >>> > > > (Kowshik): Great point! However for case #1, I'm
> > not
> > >> > sure
> > >> > > > why
> > >> > > > >> we
> > >> > > > >> > >>> need to
> > >> > > > >> > >>> > > > create a '/features' ZK node with disabled
> > features.
> > >> > > > Instead,
> > >> > > > >> do
> > >> > > > >> > >>> you see
> > >> > > > >> > >>> > > > any drawback if we just do not create it? i.e. if
> > >> IBP is
> > >> > > > less
> > >> > > > >> than
> > >> > > > >> > >>> 2.6,
> > >> > > > >> > >>> > > the
> > >> > > > >> > >>> > > > controller treats the case as though the
> versioning
> > >> > system
> > >> > > > is
> > >> > > > >> > >>> completely
> > >> > > > >> > >>> > > > disabled, and would not create a non-existing
> > >> > '/features'
> > >> > > > >> node.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > Hi Kowshik,
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > When the IBP is less than 2.6, but the software has
> > >> been
> > >> > > > >> upgraded to
> > >> > > > >> > >>> a
> > >> > > > >> > >>> > > state where it supports this KIP, that
> > >> > > > >> > >>> > >  means the user is upgrading from an earlier
> version
> > of
> > >> > the
> > >> > > > >> > >>> software.  In
> > >> > > > >> > >>> > > this case, we want to start with all the features
> > >> disabled
> > >> > > and
> > >> > > > >> allow
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > user to enable them when they are ready.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > Enabling all the possible features immediately
> after
> > an
> > >> > > > upgrade
> > >> > > > >> > >>> could be
> > >> > > > >> > >>> > > harmful to the cluster.  On the other hand, for a
> new
> > >> > > cluster,
> > >> > > > >> we do
> > >> > > > >> > >>> want
> > >> > > > >> > >>> > > to enable all the possible features immediately . I
> > was
> > >> > > > >> proposing
> > >> > > > >> > >>> this as a
> > >> > > > >> > >>> > > way to distinguish the two cases (since the new
> > cluster
> > >> > will
> > >> > > > >> never be
> > >> > > > >> > >>> > > started with an old IBP).
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > > Colin MccCabe wrote:
> > >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry).
> > >> For
> > >> > > > >> finalized
> > >> > > > >> > >>> > > features,
> > >> > > > >> > >>> > > > > why do we need both min_version_level and
> > >> > > > max_version_level?
> > >> > > > >> > >>> Assuming
> > >> > > > >> > >>> > > that
> > >> > > > >> > >>> > > > > we want all the brokers to be on the same
> feature
> > >> > > version
> > >> > > > >> level,
> > >> > > > >> > >>> we
> > >> > > > >> > >>> > > really only care
> > >> > > > >> > >>> > > > > about three numbers for each feature, right?
> The
> > >> > > minimum
> > >> > > > >> > >>> supported
> > >> > > > >> > >>> > > version
> > >> > > > >> > >>> > > > > level, the maximum supported version level, and
> > the
> > >> > > > current
> > >> > > > >> > >>> active
> > >> > > > >> > >>> > > version level.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > > We don't actually want different brokers to be
> on
> > >> > > > different
> > >> > > > >> > >>> versions of
> > >> > > > >> > >>> > > > > the same feature, right?  So we can just have
> one
> > >> > number
> > >> > > > for
> > >> > > > >> > >>> current
> > >> > > > >> > >>> > > > > version level, rather than two.  At least
> that's
> > >> what
> > >> > I
> > >> > > > was
> > >> > > > >> > >>> thinking
> > >> > > > >> > >>> > > -- let
> > >> > > > >> > >>> > > > > me know if I missed something.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > (Kowshik): It is my understanding that the
> "current
> > >> > active
> > >> > > > >> version
> > >> > > > >> > >>> level"
> > >> > > > >> > >>> > > > that you have mentioned, is the
> > "max_version_level".
> > >> But
> > >> > > we
> > >> > > > >> still
> > >> > > > >> > >>> > > > maintain/publish both min and max version levels,
> > >> > because,
> > >> > > > the
> > >> > > > >> > >>> detail
> > >> > > > >> > >>> > > about
> > >> > > > >> > >>> > > > min level is useful to external clients. This is
> > >> > described
> > >> > > > >> below.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > For any feature F, think of the closed range:
> > >> > > > >> [min_version_level,
> > >> > > > >> > >>> > > > max_version_level] as the range of finalized
> > >> versions,
> > >> > > > that's
> > >> > > > >> > >>> guaranteed
> > >> > > > >> > >>> > > to
> > >> > > > >> > >>> > > > be supported by all brokers in the cluster.
> > >> > > > >> > >>> > > >  - "max_version_level" is the finalized highest
> > >> common
> > >> > > > version
> > >> > > > >> > >>> among all
> > >> > > > >> > >>> > > > brokers,
> > >> > > > >> > >>> > > >  - "min_version_level" is the finalized lowest
> > common
> > >> > > > version
> > >> > > > >> > >>> among all
> > >> > > > >> > >>> > > > brokers.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > Next, think of "client" here as the "user of the
> > new
> > >> > > feature
> > >> > > > >> > >>> versions
> > >> > > > >> > >>> > > > system". Imagine that such a client learns about
> > >> > finalized
> > >> > > > >> feature
> > >> > > > >> > >>> > > > versions, and exercises some logic based on the
> > >> version.
> > >> > > > These
> > >> > > > >> > >>> clients
> > >> > > > >> > >>> > > can
> > >> > > > >> > >>> > > > be of 2 types:
> > >> > > > >> > >>> > > > 1. Some part of the broker code itself could
> behave
> > >> > like a
> > >> > > > >> client
> > >> > > > >> > >>> trying
> > >> > > > >> > >>> > > to
> > >> > > > >> > >>> > > > use some feature that's "internal" to the broker
> > >> > cluster.
> > >> > > > >> Such a
> > >> > > > >> > >>> client
> > >> > > > >> > >>> > > > would learn the latest finalized features via ZK.
> > >> > > > >> > >>> > > > 2. An external system (ex: Streams) could behave
> > >> like a
> > >> > > > >> client,
> > >> > > > >> > >>> trying to
> > >> > > > >> > >>> > > > use some "external" facing feature. Such a client
> > >> would
> > >> > > > learn
> > >> > > > >> > >>> latest
> > >> > > > >> > >>> > > > finalized features via ApiVersionsRequest. Ex:
> > >> > > > >> group_coordinator
> > >> > > > >> > >>> feature
> > >> > > > >> > >>> > > > described in the KIP.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > Next, imagine that for F, the max_version_level
> is
> > >> > > > >> successfully
> > >> > > > >> > >>> bumped by
> > >> > > > >> > >>> > > > +1 (via Controller API). Now it is guaranteed
> that
> > >> all
> > >> > > > brokers
> > >> > > > >> > >>> (i.e.
> > >> > > > >> > >>> > > > internal clients) understand max_version_level +
> 1.
> > >> > > However,
> > >> > > > >> it is
> > >> > > > >> > >>> still
> > >> > > > >> > >>> > > > not guaranteed that all external clients have
> > support
> > >> > for
> > >> > > > (or
> > >> > > > >> have
> > >> > > > >> > >>> > > > activated) the logic for the newer version. Why?
> > >> > Because,
> > >> > > > >> this is
> > >> > > > >> > >>> > > > subjective as explained next:
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > 1. On one hand, imagine F as an internal feature
> > only
> > >> > > > >> relevant to
> > >> > > > >> > >>> > > Brokers.
> > >> > > > >> > >>> > > > The binary for the internal client logic is
> > >> controlled
> > >> > by
> > >> > > > >> Broker
> > >> > > > >> > >>> cluster
> > >> > > > >> > >>> > > > deployments. When shipping a new Broker release,
> we
> > >> > > wouldn't
> > >> > > > >> bump
> > >> > > > >> > >>> max
> > >> > > > >> > >>> > > > "supported" feature version for F by 1, unless we
> > >> have
> > >> > > > >> introduced
> > >> > > > >> > >>> some
> > >> > > > >> > >>> > > new
> > >> > > > >> > >>> > > > logic (with a potentially breaking change) in the
> > >> > Broker.
> > >> > > > >> > >>> Furthermore,
> > >> > > > >> > >>> > > such
> > >> > > > >> > >>> > > > feature logic in the broker should/will not be
> > >> > implemented
> > >> > > > in
> > >> > > > >> a
> > >> > > > >> > >>> way that
> > >> > > > >> > >>> > > it
> > >> > > > >> > >>> > > > would activate logic for an older feature version
> > >> after
> > >> > it
> > >> > > > has
> > >> > > > >> > >>> migrated
> > >> > > > >> > >>> > > to
> > >> > > > >> > >>> > > > using the logic for a newer feature version
> > (because
> > >> > this
> > >> > > > >> could
> > >> > > > >> > >>> break the
> > >> > > > >> > >>> > > > cluster!). For these cases, max_version_level
> will
> > be
> > >> > very
> > >> > > > >> useful
> > >> > > > >> > >>> for
> > >> > > > >> > >>> > > > decision making.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > 2. On the other hand, imagine F as an external
> > facing
> > >> > > > feature.
> > >> > > > >> > >>> External
> > >> > > > >> > >>> > > > clients are not within the control of Broker
> > >> cluster. An
> > >> > > > >> external
> > >> > > > >> > >>> client
> > >> > > > >> > >>> > > > may not have upgraded it's code (yet) to use
> > >> > > > >> 'max_version_level +
> > >> > > > >> > >>> 1'.
> > >> > > > >> > >>> > > But,
> > >> > > > >> > >>> > > > the Kafka cluster could have been deployed with
> > >> support
> > >> > > for
> > >> > > > >> > >>> > > > 'max_version_level + 1' of an external facing
> > >> feature.
> > >> > > Now,
> > >> > > > >> these
> > >> > > > >> > >>> > > external
> > >> > > > >> > >>> > > > clients (until an upgrade) are benefitted in
> > learning
> > >> > > "what
> > >> > > > >> is the
> > >> > > > >> > >>> lowest
> > >> > > > >> > >>> > > > common version for F among all brokers?". This is
> > >> where
> > >> > > the
> > >> > > > >> > >>> > > > "min_version_level" becomes useful. Using this, a
> > >> client
> > >> > > > could
> > >> > > > >> > >>> learn the
> > >> > > > >> > >>> > > > specific supported versions that's lower than
> > >> > > > >> max_version_level
> > >> > > > >> > >>> (instead
> > >> > > > >> > >>> > > of
> > >> > > > >> > >>> > > > assuming that all brokers support the range: [1,
> > >> > > > >> > >>> max_version_level]). For
> > >> > > > >> > >>> > > > example, if the cluster deprecates
> > >> "min_version_level",
> > >> > > then
> > >> > > > >> the
> > >> > > > >> > >>> client
> > >> > > > >> > >>> > > > becomes aware because it periodically learns the
> > >> latest
> > >> > > > >> > >>> > > "min_version_level"
> > >> > > > >> > >>> > > > via ApiVersionsRequest.
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > Thanks for the explanation.  I agree that this does
> > >> make
> > >> > > sense
> > >> > > > >> when
> > >> > > > >> > >>> you
> > >> > > > >> > >>> > > take the client perspective into account.
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > best,
> > >> > > > >> > >>> > > Colin
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > Cheers,
> > >> > > > >> > >>> > > > Kowshik
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe <
> > >> > > > >> cmcc...@apache.org>
> > >> > > > >> > >>> > > wrote:
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > > > > Hi Kowshik,
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > Thanks again for working on this-- it looks
> > >> great.  I
> > >> > > went
> > >> > > > >> over
> > >> > > > >> > >>> the KIP
> > >> > > > >> > >>> > > > > again and have a few more comments.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > It would be good to note that deprecating a
> > feature
> > >> > > > version
> > >> > > > >> (in
> > >> > > > >> > >>> other
> > >> > > > >> > >>> > > > > words, increasing minVersionLevel on the
> broker)
> > >> is an
> > >> > > > >> > >>> incompatible
> > >> > > > >> > >>> > > change,
> > >> > > > >> > >>> > > > > which requires a major release of Kafka.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > I think the strategy for transitioning from not
> > >> > having a
> > >> > > > >> > >>> /features
> > >> > > > >> > >>> > > znode
> > >> > > > >> > >>> > > > > to having one could use some work. The current
> > >> > proposal
> > >> > > is
> > >> > > > >> to
> > >> > > > >> > >>> wait for
> > >> > > > >> > >>> > > all
> > >> > > > >> > >>> > > > > the brokers to fill in their feature znodes and
> > >> then
> > >> > > pick
> > >> > > > >> the
> > >> > > > >> > >>> highest
> > >> > > > >> > >>> > > > > common versions.  But that requires blocking in
> > the
> > >> > > > >> controller
> > >> > > > >> > >>> startup
> > >> > > > >> > >>> > > code
> > >> > > > >> > >>> > > > > until the whole cluster is active
> (technically, a
> > >> > point
> > >> > > in
> > >> > > > >> time
> > >> > > > >> > >>> which
> > >> > > > >> > >>> > > we
> > >> > > > >> > >>> > > > > never really know that we have reached...)
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > Instead, I think it would be better to have a
> > >> strategy
> > >> > > > like
> > >> > > > >> this:
> > >> > > > >> > >>> > > > > 1. If the controller comes up and there is no
> > >> > /features
> > >> > > > >> znode
> > >> > > > >> > >>> AND the
> > >> > > > >> > >>> > > IBP
> > >> > > > >> > >>> > > > > is less than 2.6, create a /features znode
> where
> > >> all
> > >> > the
> > >> > > > >> > >>> features are
> > >> > > > >> > >>> > > > > disabled.
> > >> > > > >> > >>> > > > > 2. If the controller comes up and there is no
> > >> > /features
> > >> > > > >> znode
> > >> > > > >> > >>> AND the
> > >> > > > >> > >>> > > IBP
> > >> > > > >> > >>> > > > > is greater than or equal to 2.6, create a
> > /features
> > >> > > znode
> > >> > > > >> where
> > >> > > > >> > >>> all the
> > >> > > > >> > >>> > > > > features are enabled at the highest versions
> > >> supported
> > >> > > by
> > >> > > > >> the
> > >> > > > >> > >>> > > controller.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > People upgrading from the pre-KIP-584
> > >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584>
> > >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584
> >
> > >> world
> > >> > > > will
> > >> > > > >> end
> > >> > > > >> > >>> up in
> > >> > > > >> > >>> > > case
> > >> > > > >> > >>> > > > > #1 because they have to do a double roll to
> > >> upgrade,
> > >> > and
> > >> > > > >> during
> > >> > > > >> > >>> the
> > >> > > > >> > >>> > > first
> > >> > > > >> > >>> > > > > roll, the IBP is unchanged.  People creating
> new
> > >> > > clusters
> > >> > > > >> from
> > >> > > > >> > >>> scratch
> > >> > > > >> > >>> > > will
> > >> > > > >> > >>> > > > > end up in case #2, which is what we want since
> we
> > >> > don't
> > >> > > > >> want a
> > >> > > > >> > >>> brand
> > >> > > > >> > >>> > > new
> > >> > > > >> > >>> > > > > cluster to be using old feature flag versions.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > UpdateFeaturesResponse#ErrorMessage should
> > specify
> > >> > > > >> > >>> nullableVersions
> > >> > > > >> > >>> > > since
> > >> > > > >> > >>> > > > > null is a valid value here
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > Also, in the message format, the tags we use
> need
> > >> to
> > >> > > start
> > >> > > > >> at 0.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > I don't think we need the
> > >> FEATURE_UPDATE_IN_PROGRESS
> > >> > > error
> > >> > > > >> > >>> code.  The
> > >> > > > >> > >>> > > > > controller is basically single-threaded and
> will
> > >> only
> > >> > do
> > >> > > > >> one of
> > >> > > > >> > >>> these
> > >> > > > >> > >>> > > > > operations at once.  Even if it weren't,
> though,
> > we
> > >> > > could
> > >> > > > >> simply
> > >> > > > >> > >>> block
> > >> > > > >> > >>> > > the
> > >> > > > >> > >>> > > > > second operation behind the first one.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > For updateFeatures, it would be good to specify
> > >> that
> > >> > if
> > >> > > a
> > >> > > > >> single
> > >> > > > >> > >>> > > feature
> > >> > > > >> > >>> > > > > version update in the batch can't be done, none
> > of
> > >> > them
> > >> > > > are
> > >> > > > >> > >>> done.  I
> > >> > > > >> > >>> > > think
> > >> > > > >> > >>> > > > > this was the intention, but I wasn't able to
> find
> > >> it
> > >> > > > >> spelled out
> > >> > > > >> > >>> > > (maybe i
> > >> > > > >> > >>> > > > > missed it).
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > ===
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > And now, something a little bit bigger (sorry).
> > >> For
> > >> > > > >> finalized
> > >> > > > >> > >>> > > features,
> > >> > > > >> > >>> > > > > why do we need both min_version_level and
> > >> > > > max_version_level?
> > >> > > > >> > >>> Assuming
> > >> > > > >> > >>> > > that
> > >> > > > >> > >>> > > > > we want all the brokers to be on the same
> feature
> > >> > > version
> > >> > > > >> level,
> > >> > > > >> > >>> we
> > >> > > > >> > >>> > > really
> > >> > > > >> > >>> > > > > only care about three numbers for each feature,
> > >> right?
> > >> > > > The
> > >> > > > >> > >>> minimum
> > >> > > > >> > >>> > > > > supported version level, the maximum supported
> > >> version
> > >> > > > >> level,
> > >> > > > >> > >>> and the
> > >> > > > >> > >>> > > > > current active version level.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > We don't actually want different brokers to be
> on
> > >> > > > different
> > >> > > > >> > >>> versions of
> > >> > > > >> > >>> > > > > the same feature, right?  So we can just have
> one
> > >> > number
> > >> > > > for
> > >> > > > >> > >>> current
> > >> > > > >> > >>> > > > > version level, rather than two.  At least
> that's
> > >> what
> > >> > I
> > >> > > > was
> > >> > > > >> > >>> thinking
> > >> > > > >> > >>> > > -- let
> > >> > > > >> > >>> > > > > me know if I missed something.
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > best,
> > >> > > > >> > >>> > > > > Colin
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah
> > wrote:
> > >> > > > >> > >>> > > > > > Thanks for the KIP! +1 (non-binding)
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot <
> > >> > > > >> > >>> dja...@confluent.io>
> > >> > > > >> > >>> > > wrote:
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > > > > Great KIP, thanks! +1 (non-binding)
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang
> > Wang <
> > >> > > > >> > >>> wangg...@gmail.com>
> > >> > > > >> > >>> > > > > wrote:
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > > > > Thanks for the great KIP Kowshik, +1
> > >> (binding).
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao
> <
> > >> > > > >> j...@confluent.io
> > >> > > > >> > >>> >
> > >> > > > >> > >>> > > wrote:
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > > Hi, Kowshik,
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > Thanks for the KIP. +1
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > Jun
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM
> Kowshik
> > >> > > Prakasam
> > >> > > > <
> > >> > > > >> > >>> > > > > > > > kpraka...@confluent.io>
> > >> > > > >> > >>> > > > > > > > > wrote:
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > > Hi all,
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > > I'd like to start a vote for KIP-584
> > >> > > > >> > >>> <https://issues.apache.org/jira/browse/KIP-584>
> > >> > > > >> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584
> >.
> > >> The
> > >> > > link
> > >> > > > >> to
> > >> > > > >> > >>> the KIP
> > >> > > > >> > >>> > > can
> > >> > > > >> > >>> > > > > be
> > >> > > > >> > >>> > > > > > > found
> > >> > > > >> > >>> > > > > > > > > > here:
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>>
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >> > > > >> > >>> > > > > > > > > > .
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > > Thanks!
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > > > Cheers,
> > >> > > > >> > >>> > > > > > > > > > Kowshik
> > >> > > > >> > >>> > > > > > > > > >
> > >> > > > >> > >>> > > > > > > > >
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > > > --
> > >> > > > >> > >>> > > > > > > > -- Guozhang
> > >> > > > >> > >>> > > > > > > >
> > >> > > > >> > >>> > > > > > >
> > >> > > > >> > >>> > > > > >
> > >> > > > >> > >>> > > > >
> > >> > > > >> > >>> > > >
> > >> > > > >> > >>> > >
> > >> > > > >> > >>> >
> > >> > > > >> > >>>
> > >> > > > >> > >>
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to