Hi Jun,

Thanks for the feedback. It's a very good point. I have now modified the
KIP-584 write-up "goals" section a bit. It now mentions one of the goals as
enabling rolling upgrades using a single restart (instead of 2). Also I
have removed the text explicitly aiming for deprecation of IBP. Note that
previously under "Potential features in Kafka" the IBP was mentioned under
point (4) as a possible coarse-grained feature. Hopefully, now the 2
sections of the KIP align with each other well.


Cheers,
Kowshik


On Fri, Sep 25, 2020 at 2:03 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Tue, Sep 22, 2020, at 00:43, Kowshik Prakasam wrote:
> > Hi all,
> >
> > I wanted to let you know that I have made the following changes to the
> > KIP-584 write up. The purpose is to ensure the design is correct for a
> few
> > things which came up during implementation:
> >
>
> Hi Kowshik,
>
> Thanks for the updates.
>
> >
> > 1. Per FeatureUpdate error code: The UPDATE_FEATURES controller API is no
> > longer transactional. Going forward, we allow for individual
> FeatureUpdate
> > to succeed/fail in the request. As a result, the response schema now
> > contains an error code per FeatureUpdate as well as a top-level error
> code.
> > Overall this is a better design because it better represents the nature
> of
> > the API: each FeatureUpdate in the request is independent of the other
> > updates, and the controller can process/apply these independently to ZK.
> > When an UPDATE_FEATURES request fails, this new design provides better
> > clarity to the caller on which FeatureUpdate could not be applied (via
> the
> > individual error codes). In the previous design, we were unable to
> achieve
> > such an increased level of clarity in communicating the error codes.
> >
>
> OK
>
> >
> > 2. Due to #1, there were some minor changes required to the proposed
> Admin
> > APIs (describeFeatures and updateFeatures). A few unnecessary public APIs
> > have been removed, and couple essential ones have been added. The latest
> > changes now represent the latest design.
> >
> > 3. The timeoutMs field has been removed from the the UPDATE_FEATURES API
> > request, since it was not found to be required during implementation.
> >
>
> Please don't get rid of timeoutMs.  timeoutMs is required if you want to
> implement the ability to timeout the call if the controller can't get to it
> in time.  This is important for avoiding congestion collapse where the
> controller collapses under the weight of lots of retries of the same set of
> calls.
>
> We may not be able to do it in the initial implementation, but we will
> eventually implement this for all the controller-bound RPCs.
>
> > >
> > > 2. Finalized feature version epoch data type has been made to be int32
> > > (instead of int64). The reason is that the epoch value is the value of
> ZK
> > > node version, whose data type is int32.
> > >
>
> Sorry, I missed this earlier.  Using 16 bit feature levels seems fine.
> However, please don't use a 32-bit epoch here.  We deliberately made the
> epoch 64 bits to avoid overflow problems in the future once ZK is gone.
>
> best,
> Colin
>
> > > 3. Introduced a new 'status' field in the '/features' ZK node schema.
> The
> > > purpose is to implement Colin's earlier point for the strategy for
> > > transitioning from not having a /features znode to having one. An
> > > explanation has been provided in the following section of the KIP
> detailing
> > > the different cases:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus
> > > .
> > >
> > > Please let me know if you have any questions or concerns.
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > >
> > >
> > > Cheers,
> > > Kowshik
> > >
> > > On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam <
> kpraka...@confluent.io>
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> This KIP vote has been open for ~12 days. The summary of the votes is
> > >> that we have 3 binding votes (Colin, Guozhang, Jun), and 3 non-binding
> > >> votes (David, Dhruvil, Boyang). Therefore, the KIP vote passes. I'll
> mark
> > >> KIP as accepted and start working on the implementation.
> > >>
> > >> Thanks a lot!
> > >>
> > >>
> > >> Cheers,
> > >> Kowshik
> > >>
> > >> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >>
> > >>> Thanks, Kowshik.  +1 (binding)
> > >>>
> > >>> best,
> > >>> Colin
> > >>>
> > >>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote:
> > >>> > Hi Colin,
> > >>> >
> > >>> > Thanks for the explanation! I agree with you, and I have updated
> the
> > >>> > KIP.
> > >>> > Here is a link to relevant section:
> > >>> >
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> > >>> >
> > >>> >
> > >>> > Cheers,
> > >>> > Kowshik
> > >>> >
> > >>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe <cmcc...@apache.org>
> > >>> wrote:
> > >>> >
> > >>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam wrote:
> > >>> > > > (Kowshik): Great point! However for case #1, I'm not sure why
> we
> > >>> need to
> > >>> > > > create a '/features' ZK node with disabled features. Instead,
> do
> > >>> you see
> > >>> > > > any drawback if we just do not create it? i.e. if IBP is less
> than
> > >>> 2.6,
> > >>> > > the
> > >>> > > > controller treats the case as though the versioning system is
> > >>> completely
> > >>> > > > disabled, and would not create a non-existing '/features' node.
> > >>> > >
> > >>> > > Hi Kowshik,
> > >>> > >
> > >>> > > When the IBP is less than 2.6, but the software has been
> upgraded to
> > >>> a
> > >>> > > state where it supports this KIP, that
> > >>> > >  means the user is upgrading from an earlier version of the
> > >>> software.  In
> > >>> > > this case, we want to start with all the features disabled and
> allow
> > >>> the
> > >>> > > user to enable them when they are ready.
> > >>> > >
> > >>> > > Enabling all the possible features immediately after an upgrade
> > >>> could be
> > >>> > > harmful to the cluster.  On the other hand, for a new cluster,
> we do
> > >>> want
> > >>> > > to enable all the possible features immediately . I was proposing
> > >>> this as a
> > >>> > > way to distinguish the two cases (since the new cluster will
> never be
> > >>> > > started with an old IBP).
> > >>> > >
> > >>> > > > Colin MccCabe wrote:
> > >>> > > > > And now, something a little bit bigger (sorry).  For
> finalized
> > >>> > > features,
> > >>> > > > > why do we need both min_version_level and max_version_level?
> > >>> Assuming
> > >>> > > that
> > >>> > > > > we want all the brokers to be on the same feature version
> level,
> > >>> we
> > >>> > > really only care
> > >>> > > > > about three numbers for each feature, right?  The minimum
> > >>> supported
> > >>> > > version
> > >>> > > > > level, the maximum supported version level, and the current
> > >>> active
> > >>> > > version level.
> > >>> > > >
> > >>> > > > > We don't actually want different brokers to be on different
> > >>> versions of
> > >>> > > > > the same feature, right?  So we can just have one number for
> > >>> current
> > >>> > > > > version level, rather than two.  At least that's what I was
> > >>> thinking
> > >>> > > -- let
> > >>> > > > > me know if I missed something.
> > >>> > > >
> > >>> > > > (Kowshik): It is my understanding that the "current active
> version
> > >>> level"
> > >>> > > > that you have mentioned, is the "max_version_level". But we
> still
> > >>> > > > maintain/publish both min and max version levels, because, the
> > >>> detail
> > >>> > > about
> > >>> > > > min level is useful to external clients. This is described
> below.
> > >>> > > >
> > >>> > > > For any feature F, think of the closed range:
> [min_version_level,
> > >>> > > > max_version_level] as the range of finalized versions, that's
> > >>> guaranteed
> > >>> > > to
> > >>> > > > be supported by all brokers in the cluster.
> > >>> > > >  - "max_version_level" is the finalized highest common version
> > >>> among all
> > >>> > > > brokers,
> > >>> > > >  - "min_version_level" is the finalized lowest common version
> > >>> among all
> > >>> > > > brokers.
> > >>> > > >
> > >>> > > > Next, think of "client" here as the "user of the new feature
> > >>> versions
> > >>> > > > system". Imagine that such a client learns about finalized
> feature
> > >>> > > > versions, and exercises some logic based on the version. These
> > >>> clients
> > >>> > > can
> > >>> > > > be of 2 types:
> > >>> > > > 1. Some part of the broker code itself could behave like a
> client
> > >>> trying
> > >>> > > to
> > >>> > > > use some feature that's "internal" to the broker cluster. Such
> a
> > >>> client
> > >>> > > > would learn the latest finalized features via ZK.
> > >>> > > > 2. An external system (ex: Streams) could behave like a client,
> > >>> trying to
> > >>> > > > use some "external" facing feature. Such a client would learn
> > >>> latest
> > >>> > > > finalized features via ApiVersionsRequest. Ex:
> group_coordinator
> > >>> feature
> > >>> > > > described in the KIP.
> > >>> > > >
> > >>> > > > Next, imagine that for F, the max_version_level is successfully
> > >>> bumped by
> > >>> > > > +1 (via Controller API). Now it is guaranteed that all brokers
> > >>> (i.e.
> > >>> > > > internal clients) understand max_version_level + 1. However,
> it is
> > >>> still
> > >>> > > > not guaranteed that all external clients have support for (or
> have
> > >>> > > > activated) the logic for the newer version. Why? Because, this
> is
> > >>> > > > subjective as explained next:
> > >>> > > >
> > >>> > > > 1. On one hand, imagine F as an internal feature only relevant
> to
> > >>> > > Brokers.
> > >>> > > > The binary for the internal client logic is controlled by
> Broker
> > >>> cluster
> > >>> > > > deployments. When shipping a new Broker release, we wouldn't
> bump
> > >>> max
> > >>> > > > "supported" feature version for F by 1, unless we have
> introduced
> > >>> some
> > >>> > > new
> > >>> > > > logic (with a potentially breaking change) in the Broker.
> > >>> Furthermore,
> > >>> > > such
> > >>> > > > feature logic in the broker should/will not be implemented in a
> > >>> way that
> > >>> > > it
> > >>> > > > would activate logic for an older feature version after it has
> > >>> migrated
> > >>> > > to
> > >>> > > > using the logic for a newer feature version (because this could
> > >>> break the
> > >>> > > > cluster!). For these cases, max_version_level will be very
> useful
> > >>> for
> > >>> > > > decision making.
> > >>> > > >
> > >>> > > > 2. On the other hand, imagine F as an external facing feature.
> > >>> External
> > >>> > > > clients are not within the control of Broker cluster. An
> external
> > >>> client
> > >>> > > > may not have upgraded it's code (yet) to use
> 'max_version_level +
> > >>> 1'.
> > >>> > > But,
> > >>> > > > the Kafka cluster could have been deployed with support for
> > >>> > > > 'max_version_level + 1' of an external facing feature. Now,
> these
> > >>> > > external
> > >>> > > > clients (until an upgrade) are benefitted in learning "what is
> the
> > >>> lowest
> > >>> > > > common version for F among all brokers?". This is where the
> > >>> > > > "min_version_level" becomes useful. Using this, a client could
> > >>> learn the
> > >>> > > > specific supported versions that's lower than max_version_level
> > >>> (instead
> > >>> > > of
> > >>> > > > assuming that all brokers support the range: [1,
> > >>> max_version_level]). For
> > >>> > > > example, if the cluster deprecates "min_version_level", then
> the
> > >>> client
> > >>> > > > becomes aware because it periodically learns the latest
> > >>> > > "min_version_level"
> > >>> > > > via ApiVersionsRequest.
> > >>> > > >
> > >>> > >
> > >>> > > Thanks for the explanation.  I agree that this does make sense
> when
> > >>> you
> > >>> > > take the client perspective into account.
> > >>> > >
> > >>> > > best,
> > >>> > > Colin
> > >>> > >
> > >>> > >
> > >>> > > >
> > >>> > > > Cheers,
> > >>> > > > Kowshik
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > >
> > >>> > > > On Thu, Apr 23, 2020 at 12:07 PM Colin McCabe <
> cmcc...@apache.org>
> > >>> > > wrote:
> > >>> > > >
> > >>> > > > > Hi Kowshik,
> > >>> > > > >
> > >>> > > > > Thanks again for working on this-- it looks great.  I went
> over
> > >>> the KIP
> > >>> > > > > again and have a few more comments.
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > It would be good to note that deprecating a feature version
> (in
> > >>> other
> > >>> > > > > words, increasing minVersionLevel on the broker) is an
> > >>> incompatible
> > >>> > > change,
> > >>> > > > > which requires a major release of Kafka.
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > I think the strategy for transitioning from not having a
> > >>> /features
> > >>> > > znode
> > >>> > > > > to having one could use some work. The current proposal is to
> > >>> wait for
> > >>> > > all
> > >>> > > > > the brokers to fill in their feature znodes and then pick the
> > >>> highest
> > >>> > > > > common versions.  But that requires blocking in the
> controller
> > >>> startup
> > >>> > > code
> > >>> > > > > until the whole cluster is active (technically, a point in
> time
> > >>> which
> > >>> > > we
> > >>> > > > > never really know that we have reached...)
> > >>> > > > >
> > >>> > > > > Instead, I think it would be better to have a strategy like
> this:
> > >>> > > > > 1. If the controller comes up and there is no /features znode
> > >>> AND the
> > >>> > > IBP
> > >>> > > > > is less than 2.6, create a /features znode where all the
> > >>> features are
> > >>> > > > > disabled.
> > >>> > > > > 2. If the controller comes up and there is no /features znode
> > >>> AND the
> > >>> > > IBP
> > >>> > > > > is greater than or equal to 2.6, create a /features znode
> where
> > >>> all the
> > >>> > > > > features are enabled at the highest versions supported by the
> > >>> > > controller.
> > >>> > > > >
> > >>> > > > > People upgrading from the pre-KIP-584
> > >>> <https://issues.apache.org/jira/browse/KIP-584>
> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584> world will
> end
> > >>> up in
> > >>> > > case
> > >>> > > > > #1 because they have to do a double roll to upgrade, and
> during
> > >>> the
> > >>> > > first
> > >>> > > > > roll, the IBP is unchanged.  People creating new clusters
> from
> > >>> scratch
> > >>> > > will
> > >>> > > > > end up in case #2, which is what we want since we don't want
> a
> > >>> brand
> > >>> > > new
> > >>> > > > > cluster to be using old feature flag versions.
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > UpdateFeaturesResponse#ErrorMessage should specify
> > >>> nullableVersions
> > >>> > > since
> > >>> > > > > null is a valid value here
> > >>> > > > >
> > >>> > > > > Also, in the message format, the tags we use need to start
> at 0.
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > I don't think we need the FEATURE_UPDATE_IN_PROGRESS error
> > >>> code.  The
> > >>> > > > > controller is basically single-threaded and will only do one
> of
> > >>> these
> > >>> > > > > operations at once.  Even if it weren't, though, we could
> simply
> > >>> block
> > >>> > > the
> > >>> > > > > second operation behind the first one.
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > For updateFeatures, it would be good to specify that if a
> single
> > >>> > > feature
> > >>> > > > > version update in the batch can't be done, none of them are
> > >>> done.  I
> > >>> > > think
> > >>> > > > > this was the intention, but I wasn't able to find it spelled
> out
> > >>> > > (maybe i
> > >>> > > > > missed it).
> > >>> > > > >
> > >>> > > > > ===
> > >>> > > > >
> > >>> > > > > And now, something a little bit bigger (sorry).  For
> finalized
> > >>> > > features,
> > >>> > > > > why do we need both min_version_level and max_version_level?
> > >>> Assuming
> > >>> > > that
> > >>> > > > > we want all the brokers to be on the same feature version
> level,
> > >>> we
> > >>> > > really
> > >>> > > > > only care about three numbers for each feature, right?  The
> > >>> minimum
> > >>> > > > > supported version level, the maximum supported version level,
> > >>> and the
> > >>> > > > > current active version level.
> > >>> > > > >
> > >>> > > > > We don't actually want different brokers to be on different
> > >>> versions of
> > >>> > > > > the same feature, right?  So we can just have one number for
> > >>> current
> > >>> > > > > version level, rather than two.  At least that's what I was
> > >>> thinking
> > >>> > > -- let
> > >>> > > > > me know if I missed something.
> > >>> > > > >
> > >>> > > > > best,
> > >>> > > > > Colin
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > On Tue, Apr 21, 2020, at 13:01, Dhruvil Shah wrote:
> > >>> > > > > > Thanks for the KIP! +1 (non-binding)
> > >>> > > > > >
> > >>> > > > > > On Tue, Apr 21, 2020 at 6:09 AM David Jacot <
> > >>> dja...@confluent.io>
> > >>> > > wrote:
> > >>> > > > > >
> > >>> > > > > > > Great KIP, thanks! +1 (non-binding)
> > >>> > > > > > >
> > >>> > > > > > > On Fri, Apr 17, 2020 at 8:56 PM Guozhang Wang <
> > >>> wangg...@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > > Thanks for the great KIP Kowshik, +1 (binding).
> > >>> > > > > > > >
> > >>> > > > > > > > On Fri, Apr 17, 2020 at 11:22 AM Jun Rao <
> j...@confluent.io
> > >>> >
> > >>> > > wrote:
> > >>> > > > > > > >
> > >>> > > > > > > > > Hi, Kowshik,
> > >>> > > > > > > > >
> > >>> > > > > > > > > Thanks for the KIP. +1
> > >>> > > > > > > > >
> > >>> > > > > > > > > Jun
> > >>> > > > > > > > >
> > >>> > > > > > > > > On Thu, Apr 16, 2020 at 11:14 AM Kowshik Prakasam <
> > >>> > > > > > > > kpraka...@confluent.io>
> > >>> > > > > > > > > wrote:
> > >>> > > > > > > > >
> > >>> > > > > > > > > > Hi all,
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > I'd like to start a vote for KIP-584
> > >>> <https://issues.apache.org/jira/browse/KIP-584>
> > >>> > > > > <https://issues.apache.org/jira/browse/KIP-584>. The link to
> > >>> the KIP
> > >>> > > can
> > >>> > > > > be
> > >>> > > > > > > found
> > >>> > > > > > > > > > here:
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > >
> > >>> > >
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> > >>> > > > > > > > > > .
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Thanks!
> > >>> > > > > > > > > >
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Cheers,
> > >>> > > > > > > > > > Kowshik
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > > --
> > >>> > > > > > > > -- Guozhang
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Reply via email to