Hi Kowshik,

Thanks for the KIP, this is exciting!

The KIP includes examples on how operators could use the command line
utility, etc. It would be great to add some high-level details on how the
upgrade workflow changes overall with the addition of feature versions.

- Dhruvil

On Wed, Apr 15, 2020 at 6:29 PM Kowshik Prakasam <kpraka...@confluent.io>
wrote:

> Hi Jun,
>
> Sorry the links were broken in my last response, here are the right links:
>
> 200.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioning
> Scheme For Features-Validations
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >
> 110.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-When
> To Use Versioned Feature Flags?
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Whentouseversionedfeatureflags
> ?>
>
>
> Cheers,
> Kowshik
>
> On Wed, Apr 15, 2020 at 6:24 PM Kowshik Prakasam <kpraka...@confluent.io>
> wrote:
>
> >
> > Hi Jun,
> >
> > Thanks for the feedback! I have addressed the comments in the KIP.
> >
> > > 200. In the validation section, there is still the text  "*from*
> > > {"max_version_level":
> > > X} *to* {"max_version_level": X’}". It seems that it should say "from X
> > to
> > > Y"?
> >
> > (Kowshik): Done. I have reworded it a bit to make it clearer now in this
> > section:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >
> > > 110. Could we add that we need to document the bumped version of each
> > > feature in the upgrade section of a release?
> >
> > (Kowshik): Great point! Done, I have mentioned it in #3 this section:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > %3A+Versioning+scheme+for+features#KIP-584
> > <https://issues.apache.org/jira/browse/KIP-584>
> > :Versioningschemeforfeatures-Whentouseversionedfeatureflags?
> >
> >
> > Cheers,
> > Kowshik
> >
> > On Wed, Apr 15, 2020 at 4:00 PM Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Kowshik,
> >>
> >> Looks good to me now. Just a couple of minor things below.
> >>
> >> 200. In the validation section, there is still the text  "*from*
> >> {"max_version_level":
> >> X} *to* {"max_version_level": X’}". It seems that it should say "from X
> to
> >> Y"?
> >>
> >> 110. Could we add that we need to document the bumped version of each
> >> feature in the upgrade section of a release?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Wed, Apr 15, 2020 at 1:08 PM Kowshik Prakasam <
> kpraka...@confluent.io>
> >> wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thank you for the suggestion! I have updated the KIP, please find my
> >> > response below.
> >> >
> >> > > 200. I guess you are saying only when the allowDowngrade field is
> set,
> >> > the
> >> > > finalized feature version can go backward. Otherwise, it can only go
> >> up.
> >> > > That makes sense. It would be useful to make that clear when
> >> explaining
> >> > > the usage of the allowDowngrade field. In the validation section, we
> >> > have  "
> >> > > /features' from {"max_version_level": X} to {"max_version_level":
> >> X’}",
> >> > it
> >> > > seems that we need to mention Y there.
> >> >
> >> > (Kowshik): Great point! Yes, that is correct. Done, I have updated the
> >> > validations
> >> > section explaining the above. Here is a link to this section:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Validations
> >> >
> >> >
> >> > Cheers,
> >> > Kowshik
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Apr 15, 2020 at 11:05 AM Jun Rao <j...@confluent.io> wrote:
> >> >
> >> > > Hi, Kowshik,
> >> > >
> >> > > 200. I guess you are saying only when the allowDowngrade field is
> set,
> >> > the
> >> > > finalized feature version can go backward. Otherwise, it can only go
> >> up.
> >> > > That makes sense. It would be useful to make that clear when
> >> explaining
> >> > > the usage of the allowDowngrade field. In the validation section, we
> >> have
> >> > > "
> >> > > /features' from {"max_version_level": X} to {"max_version_level":
> >> X’}",
> >> > it
> >> > > seems that we need to mention Y there.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > > On Wed, Apr 15, 2020 at 10:44 AM Kowshik Prakasam <
> >> > kpraka...@confluent.io>
> >> > > wrote:
> >> > >
> >> > > > Hi Jun,
> >> > > >
> >> > > > Great question! Please find my response below.
> >> > > >
> >> > > > > 200. My understanding is that If the CLI tool passes the
> >> > > > > '--allow-downgrade' flag when updating a specific feature, then
> a
> >> > > future
> >> > > > > downgrade is possible. Otherwise, the feature is now
> >> downgradable. If
> >> > > so,
> >> > > > I
> >> > > > > was wondering how the controller remembers this since it can be
> >> > > restarted
> >> > > > > over time?
> >> > > >
> >> > > > (Kowshik): The purpose of the flag was to just restrict the user
> >> intent
> >> > > for
> >> > > > a specific request.
> >> > > > It seems to me that to avoid confusion, I could call the flag as
> >> > > > `--try-downgrade` instead.
> >> > > > Then this makes it clear, that, the controller just has to
> consider
> >> the
> >> > > ask
> >> > > > from
> >> > > > the user as an explicit request to attempt a downgrade.
> >> > > >
> >> > > > The flag does not act as an override on controller's decision
> making
> >> > that
> >> > > > decides whether
> >> > > > a flag is downgradable (these decisions on whether to allow a flag
> >> to
> >> > be
> >> > > > downgraded
> >> > > > from a specific version level, can be embedded in the controller
> >> code).
> >> > > >
> >> > > > Please let me know what you think.
> >> > > > Sorry if I misunderstood the original question.
> >> > > >
> >> > > >
> >> > > > Cheers,
> >> > > > Kowshik
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 15, 2020 at 9:40 AM Jun Rao <j...@confluent.io> wrote:
> >> > > >
> >> > > > > Hi, Kowshik,
> >> > > > >
> >> > > > > Thanks for the reply. Makes sense. Just one more question.
> >> > > > >
> >> > > > > 200. My understanding is that If the CLI tool passes the
> >> > > > > '--allow-downgrade' flag when updating a specific feature, then
> a
> >> > > future
> >> > > > > downgrade is possible. Otherwise, the feature is now
> >> downgradable. If
> >> > > > so, I
> >> > > > > was wondering how the controller remembers this since it can be
> >> > > restarted
> >> > > > > over time?
> >> > > > >
> >> > > > > Jun
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Apr 14, 2020 at 6:49 PM Kowshik Prakasam <
> >> > > kpraka...@confluent.io
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi Jun,
> >> > > > > >
> >> > > > > > Thanks a lot for the feedback and the questions!
> >> > > > > > Please find my response below.
> >> > > > > >
> >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> >> field.
> >> > It
> >> > > > > seems
> >> > > > > > > that field needs to be persisted somewhere in ZK?
> >> > > > > >
> >> > > > > > (Kowshik): Great question! Below is my explanation. Please
> help
> >> me
> >> > > > > > understand,
> >> > > > > > if you feel there are cases where we would need to still
> >> persist it
> >> > > in
> >> > > > > ZK.
> >> > > > > >
> >> > > > > > Firstly I have updated my thoughts into the KIP now, under the
> >> > > > > 'guidelines'
> >> > > > > > section:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >> > > > > >
> >> > > > > > The allowDowngrade boolean field is just to restrict the user
> >> > intent,
> >> > > > and
> >> > > > > > to remind
> >> > > > > > them to double check their intent before proceeding. It should
> >> be
> >> > set
> >> > > > to
> >> > > > > > true
> >> > > > > > by the user in a request, only when the user intent is to
> >> > forcefully
> >> > > > > > "attempt" a
> >> > > > > > downgrade of a specific feature's max version level, to the
> >> > provided
> >> > > > > value
> >> > > > > > in
> >> > > > > > the request.
> >> > > > > >
> >> > > > > > We can extend this safeguard. The controller (on it's end) can
> >> > > maintain
> >> > > > > > rules in the code, that, for safety reasons would outright
> >> reject
> >> > > > certain
> >> > > > > > downgrades
> >> > > > > > from a specific max_version_level for a specific feature. Such
> >> > > > rejections
> >> > > > > > may
> >> > > > > > happen depending on the feature being downgraded, and from
> what
> >> > > version
> >> > > > > > level.
> >> > > > > >
> >> > > > > > The CLI tool only allows a downgrade attempt in conjunction
> with
> >> > > > specific
> >> > > > > > flags and sub-commands. For example, in the CLI tool, if the
> >> user
> >> > > uses
> >> > > > > the
> >> > > > > > 'downgrade-all' command, or passes '--allow-downgrade' flag
> when
> >> > > > > updating a
> >> > > > > > specific feature, only then the tool will translate this ask
> to
> >> > > setting
> >> > > > > > 'allowDowngrade' field in the request to the server.
> >> > > > > >
> >> > > > > > > 201. UpdateFeaturesResponse has the following top level
> >> fields.
> >> > > > Should
> >> > > > > > > those fields be per feature?
> >> > > > > > >
> >> > > > > > >   "fields": [
> >> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions":
> "0+",
> >> > > > > > >       "about": "The error code, or 0 if there was no error."
> >> },
> >> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> >> "0+",
> >> > > > > > >       "about": "The error message, or null if there was no
> >> > error."
> >> > > }
> >> > > > > > >   ]
> >> > > > > >
> >> > > > > > (Kowshik): Great question!
> >> > > > > > As such, the API is transactional, as explained in the
> sections
> >> > > linked
> >> > > > > > below.
> >> > > > > > Either all provided FeatureUpdate was applied, or none.
> >> > > > > > It's the reason I felt we can have just one error code +
> >> message.
> >> > > > > > Happy to extend this if you feel otherwise. Please let me
> know.
> >> > > > > >
> >> > > > > > Link to sections:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-ChangestoKafkaController
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guarantees
> >> > > > > >
> >> > > > > > > 202. The /features path in ZK has a field min_version_level.
> >> > Which
> >> > > > API
> >> > > > > > and
> >> > > > > > > tool can change that value?
> >> > > > > >
> >> > > > > > (Kowshik): Great question! Currently this cannot be modified
> by
> >> > using
> >> > > > the
> >> > > > > > API or the tool.
> >> > > > > > Feature version deprecation (by raising min_version_level) can
> >> be
> >> > > done
> >> > > > > only
> >> > > > > > by the Controller directly. The rationale is explained in this
> >> > > section:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Featureversiondeprecation
> >> > > > > >
> >> > > > > >
> >> > > > > > Cheers,
> >> > > > > > Kowshik
> >> > > > > >
> >> > > > > > On Tue, Apr 14, 2020 at 5:33 PM Jun Rao <j...@confluent.io>
> >> wrote:
> >> > > > > >
> >> > > > > > > Hi, Kowshik,
> >> > > > > > >
> >> > > > > > > Thanks for addressing those comments. Just a few more minor
> >> > > comments.
> >> > > > > > >
> >> > > > > > > 200. The UpdateFeaturesRequest includes an AllowDowngrade
> >> field.
> >> > It
> >> > > > > seems
> >> > > > > > > that field needs to be persisted somewhere in ZK?
> >> > > > > > >
> >> > > > > > > 201. UpdateFeaturesResponse has the following top level
> >> fields.
> >> > > > Should
> >> > > > > > > those fields be per feature?
> >> > > > > > >
> >> > > > > > >   "fields": [
> >> > > > > > >     { "name": "ErrorCode", "type": "int16", "versions":
> "0+",
> >> > > > > > >       "about": "The error code, or 0 if there was no error."
> >> },
> >> > > > > > >     { "name": "ErrorMessage", "type": "string", "versions":
> >> "0+",
> >> > > > > > >       "about": "The error message, or null if there was no
> >> > error."
> >> > > }
> >> > > > > > >   ]
> >> > > > > > >
> >> > > > > > > 202. The /features path in ZK has a field min_version_level.
> >> > Which
> >> > > > API
> >> > > > > > and
> >> > > > > > > tool can change that value?
> >> > > > > > >
> >> > > > > > > Jun
> >> > > > > > >
> >> > > > > > > On Mon, Apr 13, 2020 at 5:12 PM Kowshik Prakasam <
> >> > > > > kpraka...@confluent.io
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Jun,
> >> > > > > > > >
> >> > > > > > > > Thanks for the feedback! I have updated the KIP-584
> >> addressing
> >> > > your
> >> > > > > > > > comments.
> >> > > > > > > > Please find my response below.
> >> > > > > > > >
> >> > > > > > > > > 100.6 You can look for the sentence "This operation
> >> requires
> >> > > > ALTER
> >> > > > > on
> >> > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage in
> >> > > > > > > > > KafkaApis.authorize().
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Done. Great point! For the newly introduced
> >> > > > > UPDATE_FEATURES
> >> > > > > > > api,
> >> > > > > > > > I have added a
> >> > > > > > > > requirement that AclOperation.ALTER is required on
> >> > > > > > ResourceType.CLUSTER.
> >> > > > > > > >
> >> > > > > > > > > 110. Keeping the feature version as int is probably
> fine.
> >> I
> >> > > just
> >> > > > > felt
> >> > > > > > > > that
> >> > > > > > > > > for some of the common user interactions, it's more
> >> > convenient
> >> > > to
> >> > > > > > > > > relate that to a release version. For example, if a user
> >> > wants
> >> > > to
> >> > > > > > > > downgrade
> >> > > > > > > > > to a release 2.5, it's easier for the user to use the
> tool
> >> > like
> >> > > > > "tool
> >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature
> X
> >> > > > --version
> >> > > > > > 6".
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Great point. Generally, maximum feature version
> >> > levels
> >> > > > are
> >> > > > > > not
> >> > > > > > > > downgradable after
> >> > > > > > > > they are finalized in the cluster. This is because, as a
> >> > > guideline
> >> > > > > > > bumping
> >> > > > > > > > feature version level usually is used mainly to convey
> >> > important
> >> > > > > > breaking
> >> > > > > > > > changes.
> >> > > > > > > > Despite the above, there may be some extreme/rare cases
> >> where a
> >> > > > user
> >> > > > > > > wants
> >> > > > > > > > to downgrade
> >> > > > > > > > all features to a specific previous release. The user may
> >> want
> >> > to
> >> > > > do
> >> > > > > > this
> >> > > > > > > > just
> >> > > > > > > > prior to rolling back a Kafka cluster to a previous
> release.
> >> > > > > > > >
> >> > > > > > > > To support the above, I have made a change to the KIP
> >> > explaining
> >> > > > that
> >> > > > > > the
> >> > > > > > > > CLI tool is versioned.
> >> > > > > > > > The CLI tool internally has knowledge about a map of
> >> features
> >> > to
> >> > > > > their
> >> > > > > > > > respective max
> >> > > > > > > > versions supported by the Broker. The tool's knowledge of
> >> > > features
> >> > > > > and
> >> > > > > > > > their version values,
> >> > > > > > > > is limited to the version of the CLI tool itself i.e. the
> >> > > > information
> >> > > > > > is
> >> > > > > > > > packaged into the CLI tool
> >> > > > > > > > when it is released. Whenever a Kafka release introduces a
> >> new
> >> > > > > feature
> >> > > > > > > > version, or modifies
> >> > > > > > > > an existing feature version, the CLI tool shall also be
> >> updated
> >> > > > with
> >> > > > > > this
> >> > > > > > > > information,
> >> > > > > > > > Newer versions of the CLI tool will be released as part of
> >> the
> >> > > > Kafka
> >> > > > > > > > releases.
> >> > > > > > > >
> >> > > > > > > > Therefore, to achieve the downgrade need, the user just
> >> needs
> >> > to
> >> > > > run
> >> > > > > > the
> >> > > > > > > > version of
> >> > > > > > > > the CLI tool that's part of the particular previous
> release
> >> > that
> >> > > > > he/she
> >> > > > > > > is
> >> > > > > > > > downgrading to.
> >> > > > > > > > To help the user with this, there is a new command added
> to
> >> the
> >> > > CLI
> >> > > > > > tool
> >> > > > > > > > called `downgrade-all`.
> >> > > > > > > > This essentially downgrades max version levels of all
> >> features
> >> > in
> >> > > > the
> >> > > > > > > > cluster to the versions
> >> > > > > > > > known to the CLI tool internally.
> >> > > > > > > >
> >> > > > > > > > I have explained the above in the KIP under these
> sections:
> >> > > > > > > >
> >> > > > > > > > Tooling support (have explained that the CLI tool is
> >> > versioned):
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >> > > > > > > >
> >> > > > > > > > Regular CLI tool usage (please refer to point #3, and see
> >> the
> >> > > > tooling
> >> > > > > > > > example)
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >> > > > > > > >
> >> > > > > > > > > 110. Similarly, if the client library finds a feature
> >> > mismatch
> >> > > > with
> >> > > > > > the
> >> > > > > > > > broker,
> >> > > > > > > > > the client likely needs to log some error message for
> the
> >> > user
> >> > > to
> >> > > > > > take
> >> > > > > > > > some
> >> > > > > > > > > actions. It's much more actionable if the error message
> is
> >> > > > "upgrade
> >> > > > > > the
> >> > > > > > > > > broker to release version 2.6" than just "upgrade the
> >> broker
> >> > to
> >> > > > > > feature
> >> > > > > > > > > version 7".
> >> > > > > > > >
> >> > > > > > > > (Kowshik): That's a really good point! If we use ints for
> >> > feature
> >> > > > > > > versions,
> >> > > > > > > > the best
> >> > > > > > > > message that client can print for debugging is "broker
> >> doesn't
> >> > > > > support
> >> > > > > > > > feature version 7", and alongside that print the supported
> >> > > version
> >> > > > > > range
> >> > > > > > > > returned
> >> > > > > > > > by the broker. Then, does it sound reasonable that the
> user
> >> > could
> >> > > > > then
> >> > > > > > > > reference
> >> > > > > > > > Kafka release logs to figure out which version of the
> broker
> >> > > > release
> >> > > > > is
> >> > > > > > > > required
> >> > > > > > > > be deployed, to support feature version 7? I couldn't
> think
> >> of
> >> > a
> >> > > > > better
> >> > > > > > > > strategy here.
> >> > > > > > > >
> >> > > > > > > > > 120. When should a developer bump up the version of a
> >> > feature?
> >> > > > > > > >
> >> > > > > > > > (Kowshik): Great question! In the KIP, I have added a
> >> section:
> >> > > > > > > 'Guidelines
> >> > > > > > > > on feature versions and workflows'
> >> > > > > > > > providing some guidelines on when to use the versioned
> >> feature
> >> > > > flags,
> >> > > > > > and
> >> > > > > > > > what
> >> > > > > > > > are the regular workflows with the CLI tool.
> >> > > > > > > >
> >> > > > > > > > Link to the relevant sections:
> >> > > > > > > > Guidelines:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Guidelinesonfeatureversionsandworkflows
> >> > > > > > > >
> >> > > > > > > > Regular CLI tool usage:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-RegularCLItoolusage
> >> > > > > > > >
> >> > > > > > > > Advanced CLI tool usage:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-AdvancedCLItoolusage
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Cheers,
> >> > > > > > > > Kowshik
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Fri, Apr 10, 2020 at 4:25 PM Jun Rao <j...@confluent.io
> >
> >> > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi, Kowshik,
> >> > > > > > > > >
> >> > > > > > > > > Thanks for the reply. A few more comments.
> >> > > > > > > > >
> >> > > > > > > > > 110. Keeping the feature version as int is probably
> fine.
> >> I
> >> > > just
> >> > > > > felt
> >> > > > > > > > that
> >> > > > > > > > > for some of the common user interactions, it's more
> >> > convenient
> >> > > to
> >> > > > > > > > > relate that to a release version. For example, if a user
> >> > wants
> >> > > to
> >> > > > > > > > downgrade
> >> > > > > > > > > to a release 2.5, it's easier for the user to use the
> tool
> >> > like
> >> > > > > "tool
> >> > > > > > > > > --downgrade 2.5" instead of "tool --downgrade --feature
> X
> >> > > > --version
> >> > > > > > 6".
> >> > > > > > > > > Similarly, if the client library finds a feature
> mismatch
> >> > with
> >> > > > the
> >> > > > > > > > broker,
> >> > > > > > > > > the client likely needs to log some error message for
> the
> >> > user
> >> > > to
> >> > > > > > take
> >> > > > > > > > some
> >> > > > > > > > > actions. It's much more actionable if the error message
> is
> >> > > > "upgrade
> >> > > > > > the
> >> > > > > > > > > broker to release version 2.6" than just "upgrade the
> >> broker
> >> > to
> >> > > > > > feature
> >> > > > > > > > > version 7".
> >> > > > > > > > >
> >> > > > > > > > > 111. Sounds good.
> >> > > > > > > > >
> >> > > > > > > > > 120. When should a developer bump up the version of a
> >> > feature?
> >> > > > > > > > >
> >> > > > > > > > > Jun
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Apr 7, 2020 at 12:26 AM Kowshik Prakasam <
> >> > > > > > > kpraka...@confluent.io
> >> > > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Jun,
> >> > > > > > > > > >
> >> > > > > > > > > > I have updated the KIP for the item 111.
> >> > > > > > > > > > I'm in the process of addressing 100.6, and will
> >> provide an
> >> > > > > update
> >> > > > > > > > soon.
> >> > > > > > > > > > I think item 110 is still under discussion given we
> are
> >> now
> >> > > > > > > providing a
> >> > > > > > > > > way
> >> > > > > > > > > > to finalize
> >> > > > > > > > > > all features to their latest version levels. In any
> >> case,
> >> > > > please
> >> > > > > > let
> >> > > > > > > us
> >> > > > > > > > > > know
> >> > > > > > > > > > how you feel in response to Colin's comments on this
> >> topic.
> >> > > > > > > > > >
> >> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> >> default
> >> > > > value
> >> > > > > > is
> >> > > > > > > > the
> >> > > > > > > > > > > current released version. So, if you are a brand new
> >> > user,
> >> > > > you
> >> > > > > > > don't
> >> > > > > > > > > need
> >> > > > > > > > > > > to configure IBP and all new features will be
> >> immediately
> >> > > > > > available
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > new cluster. If you are upgrading from an old
> version,
> >> > you
> >> > > do
> >> > > > > > need
> >> > > > > > > to
> >> > > > > > > > > > > understand and configure IBP. I see a similar
> pattern
> >> > here
> >> > > > for
> >> > > > > > > > > > > features. From the ease of use perspective, ideally,
> >> we
> >> > > > > shouldn't
> >> > > > > > > > > require
> >> > > > > > > > > > a
> >> > > > > > > > > > > new user to have an extra step such as running a
> >> > bootstrap
> >> > > > > script
> >> > > > > > > > > unless
> >> > > > > > > > > > > it's truly necessary. If someone has a special need
> >> (all
> >> > > the
> >> > > > > > cases
> >> > > > > > > > you
> >> > > > > > > > > > > mentioned seem special cases?), they can configure a
> >> mode
> >> > > > such
> >> > > > > > that
> >> > > > > > > > > > > features are enabled/disabled manually.
> >> > > > > > > > > >
> >> > > > > > > > > > (Kowshik): That makes sense, thanks for the idea!
> Sorry
> >> if
> >> > I
> >> > > > > didn't
> >> > > > > > > > > > understand
> >> > > > > > > > > > this need earlier. I have updated the KIP with the
> >> approach
> >> > > > that
> >> > > > > > > > whenever
> >> > > > > > > > > > the '/features' node is absent, the controller by
> >> default
> >> > > will
> >> > > > > > > > bootstrap
> >> > > > > > > > > > the node
> >> > > > > > > > > > to contain the latest feature levels. Here is the new
> >> > section
> >> > > > in
> >> > > > > > the
> >> > > > > > > > KIP
> >> > > > > > > > > > describing
> >> > > > > > > > > > the same:
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
> >> > > > > > > > > >
> >> > > > > > > > > > Next, as I explained in my response to Colin's
> >> suggestions,
> >> > > we
> >> > > > > are
> >> > > > > > > now
> >> > > > > > > > > > providing a `--finalize-latest-features` flag with the
> >> > > tooling.
> >> > > > > > This
> >> > > > > > > > lets
> >> > > > > > > > > > the sysadmin finalize all features known to the
> >> controller
> >> > to
> >> > > > > their
> >> > > > > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > levels. Please look at this section (point #3 and the
> >> > tooling
> >> > > > > > example
> >> > > > > > > > > > later):
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Toolingsupport
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Do you feel this addresses your comment/concern?
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Cheers,
> >> > > > > > > > > > Kowshik
> >> > > > > > > > > >
> >> > > > > > > > > > On Mon, Apr 6, 2020 at 12:06 PM Jun Rao <
> >> j...@confluent.io>
> >> > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thanks for the reply. A few more replies below.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 100.6 You can look for the sentence "This operation
> >> > > requires
> >> > > > > > ALTER
> >> > > > > > > on
> >> > > > > > > > > > > CLUSTER." in KIP-455. Also, you can check its usage
> in
> >> > > > > > > > > > > KafkaApis.authorize().
> >> > > > > > > > > > >
> >> > > > > > > > > > > 110. From the external client/tooling perspective,
> >> it's
> >> > > more
> >> > > > > > > natural
> >> > > > > > > > to
> >> > > > > > > > > > use
> >> > > > > > > > > > > the release version for features. If we can use the
> >> same
> >> > > > > release
> >> > > > > > > > > version
> >> > > > > > > > > > > for internal representation, it seems simpler
> (easier
> >> to
> >> > > > > > > understand,
> >> > > > > > > > no
> >> > > > > > > > > > > mapping overhead, etc). Is there a benefit with
> >> separate
> >> > > > > external
> >> > > > > > > and
> >> > > > > > > > > > > internal versioning schemes?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 111. To put this in context, when we had IBP, the
> >> default
> >> > > > value
> >> > > > > > is
> >> > > > > > > > the
> >> > > > > > > > > > > current released version. So, if you are a brand new
> >> > user,
> >> > > > you
> >> > > > > > > don't
> >> > > > > > > > > need
> >> > > > > > > > > > > to configure IBP and all new features will be
> >> immediately
> >> > > > > > available
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > new cluster. If you are upgrading from an old
> version,
> >> > you
> >> > > do
> >> > > > > > need
> >> > > > > > > to
> >> > > > > > > > > > > understand and configure IBP. I see a similar
> pattern
> >> > here
> >> > > > for
> >> > > > > > > > > > > features. From the ease of use perspective, ideally,
> >> we
> >> > > > > shouldn't
> >> > > > > > > > > > require a
> >> > > > > > > > > > > new user to have an extra step such as running a
> >> > bootstrap
> >> > > > > script
> >> > > > > > > > > unless
> >> > > > > > > > > > > it's truly necessary. If someone has a special need
> >> (all
> >> > > the
> >> > > > > > cases
> >> > > > > > > > you
> >> > > > > > > > > > > mentioned seem special cases?), they can configure a
> >> mode
> >> > > > such
> >> > > > > > that
> >> > > > > > > > > > > features are enabled/disabled manually.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Jun
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Fri, Apr 3, 2020 at 5:54 PM Kowshik Prakasam <
> >> > > > > > > > > kpraka...@confluent.io>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi Jun,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks for the feedback and suggestions. Please
> >> find my
> >> > > > > > response
> >> > > > > > > > > below.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> >> > control
> >> > > > who
> >> > > > > > is
> >> > > > > > > > > > allowed
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > issue that request if security is enabled. So,
> we
> >> > need
> >> > > to
> >> > > > > > > assign
> >> > > > > > > > > the
> >> > > > > > > > > > > new
> >> > > > > > > > > > > > > request a ResourceType and possible
> AclOperations.
> >> > See
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> > > > > > > > > > > > > as an example.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): I don't see any reference to the words
> >> > > > > ResourceType
> >> > > > > > or
> >> > > > > > > > > > > > AclOperations
> >> > > > > > > > > > > > in the KIP. Please let me know how I can use the
> KIP
> >> > that
> >> > > > you
> >> > > > > > > > linked
> >> > > > > > > > > to
> >> > > > > > > > > > > > know how to
> >> > > > > > > > > > > > setup the appropriate ResourceType and/or
> >> > > ClusterOperation?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 105. If we change delete to disable, it's better
> >> to
> >> > do
> >> > > > this
> >> > > > > > > > > > > consistently
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > request protocol and admin api as well.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): The API shouldn't be called 'disable'
> >> when
> >> > it
> >> > > is
> >> > > > > > > > deleting
> >> > > > > > > > > a
> >> > > > > > > > > > > > feature.
> >> > > > > > > > > > > > I've just changed the KIP to use 'delete'. I don't
> >> > have a
> >> > > > > > strong
> >> > > > > > > > > > > > preference.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> >> > int64.
> >> > > > > > > Currently,
> >> > > > > > > > > our
> >> > > > > > > > > > > > > release version schema is major.minor.bugfix
> (e.g.
> >> > > > 2.5.0).
> >> > > > > > It's
> >> > > > > > > > > > > possible
> >> > > > > > > > > > > > > for new features to be included in minor
> releases
> >> > too.
> >> > > > > Should
> >> > > > > > > we
> >> > > > > > > > > make
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > feature versioning match the release versioning?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): The release version can be mapped to a
> >> set
> >> > of
> >> > > > > > feature
> >> > > > > > > > > > > versions,
> >> > > > > > > > > > > > and this can be done, for example in the tool (or
> >> even
> >> > > > > external
> >> > > > > > > to
> >> > > > > > > > > the
> >> > > > > > > > > > > > tool).
> >> > > > > > > > > > > > Can you please clarify what I'm missing?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 111. "During regular operations, the data in the
> >> ZK
> >> > > node
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > mutated
> >> > > > > > > > > > > > > only via a specific admin API served only by the
> >> > > > > > controller." I
> >> > > > > > > > am
> >> > > > > > > > > > > > > wondering why can't the controller auto
> finalize a
> >> > > > feature
> >> > > > > > > > version
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > > all brokers are upgraded? For new users who
> >> download
> >> > > the
> >> > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them
> to
> >> > have
> >> > > > to
> >> > > > > > > > manually
> >> > > > > > > > > > > > enable
> >> > > > > > > > > > > > > each feature.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): I agree that there is a trade-off here,
> >> but
> >> > it
> >> > > > > will
> >> > > > > > > help
> >> > > > > > > > > > > > to decide whether the automation can be thought
> >> through
> >> > > in
> >> > > > > the
> >> > > > > > > > future
> >> > > > > > > > > > > > in a follow up KIP, or right now in this KIP. We
> may
> >> > > invest
> >> > > > > > > > > > > > in automation, but we have to decide whether we
> >> should
> >> > do
> >> > > > it
> >> > > > > > > > > > > > now or later.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > For the inconvenience that you mentioned, do you
> >> think
> >> > > the
> >> > > > > > > problem
> >> > > > > > > > > that
> >> > > > > > > > > > > you
> >> > > > > > > > > > > > mentioned can be  overcome by asking for the
> cluster
> >> > > > operator
> >> > > > > > to
> >> > > > > > > > run
> >> > > > > > > > > a
> >> > > > > > > > > > > > bootstrap script  when he/she knows that a
> specific
> >> AK
> >> > > > > release
> >> > > > > > > has
> >> > > > > > > > > been
> >> > > > > > > > > > > > almost completely deployed in a cluster for the
> >> first
> >> > > time?
> >> > > > > > Idea
> >> > > > > > > is
> >> > > > > > > > > > that
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > bootstrap script will know how to map a specific
> AK
> >> > > release
> >> > > > > to
> >> > > > > > > > > > finalized
> >> > > > > > > > > > > > feature versions, and run the `kafka-features.sh`
> >> tool
> >> > > > > > > > appropriately
> >> > > > > > > > > > > > against
> >> > > > > > > > > > > > the cluster.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Now, coming back to your automation
> >> proposal/question.
> >> > > > > > > > > > > > I do see the value of automated feature version
> >> > > > finalization,
> >> > > > > > > but I
> >> > > > > > > > > > also
> >> > > > > > > > > > > > see
> >> > > > > > > > > > > > that this will open up several questions and some
> >> > risks,
> >> > > as
> >> > > > > > > > explained
> >> > > > > > > > > > > > below.
> >> > > > > > > > > > > > The answers to these depend on the definition of
> the
> >> > > > > automation
> >> > > > > > > we
> >> > > > > > > > > > choose
> >> > > > > > > > > > > > to build, and how well does it fit into a kafka
> >> > > deployment.
> >> > > > > > > > > > > > Basically, it can be unsafe for the controller to
> >> > > finalize
> >> > > > > > > feature
> >> > > > > > > > > > > version
> >> > > > > > > > > > > > upgrades automatically, without learning about the
> >> > intent
> >> > > > of
> >> > > > > > the
> >> > > > > > > > > > cluster
> >> > > > > > > > > > > > operator.
> >> > > > > > > > > > > > 1. We would sometimes want to lock feature
> versions
> >> > only
> >> > > > when
> >> > > > > > we
> >> > > > > > > > have
> >> > > > > > > > > > > > externally verified
> >> > > > > > > > > > > > the stability of the broker binary.
> >> > > > > > > > > > > > 2. Sometimes only the cluster operator knows that
> a
> >> > > cluster
> >> > > > > > > upgrade
> >> > > > > > > > > is
> >> > > > > > > > > > > > complete,
> >> > > > > > > > > > > > and new brokers are highly unlikely to join the
> >> > cluster.
> >> > > > > > > > > > > > 3. Only the cluster operator knows that the intent
> >> is
> >> > to
> >> > > > > deploy
> >> > > > > > > the
> >> > > > > > > > > > same
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > of the new broker release across the entire
> cluster
> >> > (i.e.
> >> > > > the
> >> > > > > > > > latest
> >> > > > > > > > > > > > downloaded version).
> >> > > > > > > > > > > > 4. For downgrades, it appears the controller still
> >> > needs
> >> > > > some
> >> > > > > > > > > external
> >> > > > > > > > > > > > input
> >> > > > > > > > > > > > (such as the proposed tool) to finalize a feature
> >> > version
> >> > > > > > > > downgrade.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > If we have automation, that automation can end up
> >> > failing
> >> > > > in
> >> > > > > > some
> >> > > > > > > > of
> >> > > > > > > > > > the
> >> > > > > > > > > > > > cases
> >> > > > > > > > > > > > above. Then, we need a way to declare that the
> >> cluster
> >> > is
> >> > > > > "not
> >> > > > > > > > ready"
> >> > > > > > > > > > if
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > controller cannot automatically finalize some
> basic
> >> > > > required
> >> > > > > > > > feature
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > upgrades across the cluster. We need to make the
> >> > cluster
> >> > > > > > operator
> >> > > > > > > > > aware
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > such a scenario (raise an alert or alike).
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> >> > should
> >> > > > be
> >> > > > > 49
> >> > > > > > > > > instead
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > 48.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > (Kowshik): Done.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Fri, Apr 3, 2020 at 11:24 AM Jun Rao <
> >> > > j...@confluent.io>
> >> > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Thanks for the reply. A few more comments below.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 100.6 For every new request, the admin needs to
> >> > control
> >> > > > who
> >> > > > > > is
> >> > > > > > > > > > allowed
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > issue that request if security is enabled. So,
> we
> >> > need
> >> > > to
> >> > > > > > > assign
> >> > > > > > > > > the
> >> > > > > > > > > > > new
> >> > > > > > > > > > > > > request a ResourceType and possible
> AclOperations.
> >> > See
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > an example.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 105. If we change delete to disable, it's better
> >> to
> >> > do
> >> > > > this
> >> > > > > > > > > > > consistently
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > request protocol and admin api as well.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 110. The minVersion/maxVersion for features use
> >> > int64.
> >> > > > > > > Currently,
> >> > > > > > > > > our
> >> > > > > > > > > > > > > release version schema is major.minor.bugfix
> (e.g.
> >> > > > 2.5.0).
> >> > > > > > It's
> >> > > > > > > > > > > possible
> >> > > > > > > > > > > > > for new features to be included in minor
> releases
> >> > too.
> >> > > > > Should
> >> > > > > > > we
> >> > > > > > > > > make
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > feature versioning match the release versioning?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 111. "During regular operations, the data in the
> >> ZK
> >> > > node
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > mutated
> >> > > > > > > > > > > > > only via a specific admin API served only by the
> >> > > > > > controller." I
> >> > > > > > > > am
> >> > > > > > > > > > > > > wondering why can't the controller auto
> finalize a
> >> > > > feature
> >> > > > > > > > version
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > > all brokers are upgraded? For new users who
> >> download
> >> > > the
> >> > > > > > latest
> >> > > > > > > > > > version
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > build a new cluster, it's inconvenient for them
> to
> >> > have
> >> > > > to
> >> > > > > > > > manually
> >> > > > > > > > > > > > enable
> >> > > > > > > > > > > > > each feature.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > 112. DeleteFeaturesResponse: It seems the apiKey
> >> > should
> >> > > > be
> >> > > > > 49
> >> > > > > > > > > instead
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > > 48.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Jun
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Fri, Apr 3, 2020 at 1:27 AM Kowshik Prakasam
> <
> >> > > > > > > > > > > kpraka...@confluent.io>
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hey Jun,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks a lot for the great feedback! Please
> note
> >> > that
> >> > > > the
> >> > > > > > > > design
> >> > > > > > > > > > > > > > has changed a little bit on the KIP, and we
> now
> >> > > > propagate
> >> > > > > > the
> >> > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > features metadata only via ZK watches (instead
> >> of
> >> > > > > > > > > > > UpdateMetadataRequest
> >> > > > > > > > > > > > > > from the controller).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Please find below my response to your
> >> > > > questions/feedback,
> >> > > > > > > with
> >> > > > > > > > > the
> >> > > > > > > > > > > > prefix
> >> > > > > > > > > > > > > > "(Kowshik):".
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.
> >> UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> >> from
> >> > > > > > brokers,
> >> > > > > > > > > should
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > timeout in the request (like
> >> createTopicRequest)?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I have added a
> >> > timeout
> >> > > > > field.
> >> > > > > > > > Note:
> >> > > > > > > > > > we
> >> > > > > > > > > > > no
> >> > > > > > > > > > > > > > longer
> >> > > > > > > > > > > > > > wait for responses from brokers, since the
> >> design
> >> > has
> >> > > > > been
> >> > > > > > > > > changed
> >> > > > > > > > > > so
> >> > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > features information is propagated via ZK.
> >> > > > Nevertheless,
> >> > > > > it
> >> > > > > > > is
> >> > > > > > > > > > right
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > have a timeout
> >> > > > > > > > > > > > > > for the request.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> >> > > Typically,
> >> > > > > the
> >> > > > > > > > > response
> >> > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > shows an error code and an error message,
> >> instead
> >> > > of
> >> > > > > > > echoing
> >> > > > > > > > > the
> >> > > > > > > > > > > > > request.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Yeah, I have modified
> >> it to
> >> > > > just
> >> > > > > > > return
> >> > > > > > > > > an
> >> > > > > > > > > > > > error
> >> > > > > > > > > > > > > > code and a message.
> >> > > > > > > > > > > > > > Previously it was not echoing the "request",
> >> rather
> >> > > it
> >> > > > > was
> >> > > > > > > > > > returning
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > latest set of
> >> > > > > > > > > > > > > > cluster-wide finalized features (after
> applying
> >> the
> >> > > > > > updates).
> >> > > > > > > > But
> >> > > > > > > > > > you
> >> > > > > > > > > > > > are
> >> > > > > > > > > > > > > > right,
> >> > > > > > > > > > > > > > the additional info is not required, so I have
> >> > > removed
> >> > > > it
> >> > > > > > > from
> >> > > > > > > > > the
> >> > > > > > > > > > > > > response
> >> > > > > > > > > > > > > > schema.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> >> > > > list/describe
> >> > > > > > the
> >> > > > > > > > > > > existing
> >> > > > > > > > > > > > > > > features?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): This is already present in the KIP
> >> via
> >> > the
> >> > > > > > > > > > > > 'DescribeFeatures'
> >> > > > > > > > > > > > > > Admin API,
> >> > > > > > > > > > > > > > which, underneath covers uses the
> >> > ApiVersionsRequest
> >> > > to
> >> > > > > > > > > > list/describe
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > existing features. Please read the 'Tooling
> >> > support'
> >> > > > > > section.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> >> in a
> >> > > > > single
> >> > > > > > > > > request.
> >> > > > > > > > > > > For
> >> > > > > > > > > > > > > > > DELETE, the version field doesn't make
> sense.
> >> > So, I
> >> > > > > guess
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > ignores this? An alternative way is to have
> a
> >> > > > separate
> >> > > > > > > > > > > > > > DeleteFeaturesRequest
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! I have modified the
> KIP
> >> now
> >> > > to
> >> > > > > > have 2
> >> > > > > > > > > > > separate
> >> > > > > > > > > > > > > > controller APIs
> >> > > > > > > > > > > > > > serving these different purposes:
> >> > > > > > > > > > > > > > 1. updateFeatures
> >> > > > > > > > > > > > > > 2. deleteFeatures
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have
> "The
> >> > > > > > monotonically
> >> > > > > > > > > > > > increasing
> >> > > > > > > > > > > > > > > version of the metadata for finalized
> >> features."
> >> > I
> >> > > am
> >> > > > > > > > wondering
> >> > > > > > > > > > why
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > ordering is important?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): In the latest KIP write-up, it is
> >> called
> >> > > > epoch
> >> > > > > > > > > (instead
> >> > > > > > > > > > of
> >> > > > > > > > > > > > > > version), and
> >> > > > > > > > > > > > > > it is just the ZK node version. Basically,
> this
> >> is
> >> > > the
> >> > > > > > epoch
> >> > > > > > > > for
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > > finalized feature version metadata. This
> >> metadata
> >> > is
> >> > > > > served
> >> > > > > > > to
> >> > > > > > > > > > > clients
> >> > > > > > > > > > > > > via
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > ApiVersionsResponse (for reads). We propagate
> >> > updates
> >> > > > > from
> >> > > > > > > the
> >> > > > > > > > > > > > > '/features'
> >> > > > > > > > > > > > > > ZK node
> >> > > > > > > > > > > > > > to all brokers, via ZK watches setup by each
> >> broker
> >> > > on
> >> > > > > the
> >> > > > > > > > > > > '/features'
> >> > > > > > > > > > > > > > node.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Now here is why the ordering is important:
> >> > > > > > > > > > > > > > ZK watches don't propagate at the same time.
> As
> >> a
> >> > > > result,
> >> > > > > > the
> >> > > > > > > > > > > > > > ApiVersionsResponse
> >> > > > > > > > > > > > > > is eventually consistent across brokers. This
> >> can
> >> > > > > introduce
> >> > > > > > > > cases
> >> > > > > > > > > > > > > > where clients see an older lower epoch of the
> >> > > features
> >> > > > > > > > metadata,
> >> > > > > > > > > > > after
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > more recent
> >> > > > > > > > > > > > > > higher epoch was returned at a previous point
> in
> >> > > time.
> >> > > > We
> >> > > > > > > > expect
> >> > > > > > > > > > > > clients
> >> > > > > > > > > > > > > > to always employ the rule that the latest
> >> received
> >> > > > higher
> >> > > > > > > epoch
> >> > > > > > > > > of
> >> > > > > > > > > > > > > metadata
> >> > > > > > > > > > > > > > always trumps an older smaller epoch. Those
> >> clients
> >> > > > that
> >> > > > > > are
> >> > > > > > > > > > external
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > > Kafka should strongly consider discovering the
> >> > latest
> >> > > > > > > metadata
> >> > > > > > > > > once
> >> > > > > > > > > > > > > during
> >> > > > > > > > > > > > > > startup from the brokers, and if required
> >> refresh
> >> > the
> >> > > > > > > metadata
> >> > > > > > > > > > > > > periodically
> >> > > > > > > > > > > > > > (to get the latest metadata).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> >> this
> >> > > new
> >> > > > > > > > request?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): What is ACL, and how could I find
> out
> >> > > which
> >> > > > > one
> >> > > > > > to
> >> > > > > > > > > > > specify?
> >> > > > > > > > > > > > > > Please could you provide me some pointers?
> I'll
> >> be
> >> > > glad
> >> > > > > to
> >> > > > > > > > update
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > > KIP once I know the next steps.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> >> should
> >> > we
> >> > > > > bump
> >> > > > > > up
> >> > > > > > > > the
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the json?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I've increased
> the
> >> > > > version
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > json
> >> > > > > > > > > > > > > > by 1.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if
> we
> >> > need
> >> > > > the
> >> > > > > > > epoch
> >> > > > > > > > > > > field.
> >> > > > > > > > > > > > > Each
> >> > > > > > > > > > > > > > > ZK node has an internal version field that
> is
> >> > > > > incremented
> >> > > > > > > on
> >> > > > > > > > > > every
> >> > > > > > > > > > > > > > update.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Great point! Done. I'm using the ZK
> >> node
> >> > > > > version
> >> > > > > > > > now,
> >> > > > > > > > > > > > instead
> >> > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > explicitly
> >> > > > > > > > > > > > > > incremented epoch.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> >> feature
> >> > > > > version
> >> > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > left to the discretion of the logic
> >> implementing
> >> > > the
> >> > > > > > > feature
> >> > > > > > > > > (ex:
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> >> mean
> >> > > the
> >> > > > > > broker
> >> > > > > > > > > > > > > registration
> >> > > > > > > > > > > > > > ZK
> >> > > > > > > > > > > > > > > node will be updated dynamically when this
> >> > happens?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): Not really. The text was just
> >> conveying
> >> > > > that a
> >> > > > > > > > broker
> >> > > > > > > > > > > could
> >> > > > > > > > > > > > > > "know" of
> >> > > > > > > > > > > > > > a new feature version, but it does not mean
> the
> >> > > broker
> >> > > > > > should
> >> > > > > > > > > have
> >> > > > > > > > > > > also
> >> > > > > > > > > > > > > > activated the effects of the feature version.
> >> > Knowing
> >> > > > vs
> >> > > > > > > > > activation
> >> > > > > > > > > > > > are 2
> >> > > > > > > > > > > > > > separate things,
> >> > > > > > > > > > > > > > and the latter can be achieved by dynamic
> >> config. I
> >> > > > have
> >> > > > > > > > reworded
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > text
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > make this clear to the reader.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when
> the
> >> > > feature
> >> > > > > > > > metadata
> >> > > > > > > > > is
> >> > > > > > > > > > > > > > included
> >> > > > > > > > > > > > > > > in the request. My understanding is that
> it's
> >> > only
> >> > > > > > included
> >> > > > > > > > if
> >> > > > > > > > > > (1)
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > a change to the finalized feature; (2)
> broker
> >> > > > restart;
> >> > > > > > (3)
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > failover.
> >> > > > > > > > > > > > > > > 104.2 The new fields have the following
> >> versions.
> >> > > Why
> >> > > > > are
> >> > > > > > > the
> >> > > > > > > > > > > > versions
> >> > > > > > > > > > > > > 3+
> >> > > > > > > > > > > > > > > when the top version is bumped to 6?
> >> > > > > > > > > > > > > > >       "fields":  [
> >> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> >> > > > "versions":
> >> > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The name of the
> >> feature."},
> >> > > > > > > > > > > > > > >         {"name":  "Version", "type":
> "int64",
> >> > > > > > "versions":
> >> > > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The finalized version
> for
> >> the
> >> > > > > > > feature."}
> >> > > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): With the new improved design, we
> have
> >> > > > > completely
> >> > > > > > > > > > > eliminated
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > need to
> >> > > > > > > > > > > > > > use UpdateMetadataRequest. This is because we
> >> now
> >> > > rely
> >> > > > on
> >> > > > > > ZK
> >> > > > > > > to
> >> > > > > > > > > > > deliver
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > notifications for changes to the '/features'
> ZK
> >> > node.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> >> > > > update/delete,
> >> > > > > > > > perhaps
> >> > > > > > > > > > > it's
> >> > > > > > > > > > > > > > better
> >> > > > > > > > > > > > > > > to use enable/disable?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > (Kowshik): For delete, yes, I have changed it
> so
> >> > that
> >> > > > we
> >> > > > > > > > instead
> >> > > > > > > > > > call
> >> > > > > > > > > > > > it
> >> > > > > > > > > > > > > > 'disable'.
> >> > > > > > > > > > > > > > However for 'update', it can now also refer to
> >> > either
> >> > > > an
> >> > > > > > > > upgrade
> >> > > > > > > > > > or a
> >> > > > > > > > > > > > > > forced downgrade.
> >> > > > > > > > > > > > > > Therefore, I have left it the way it is, just
> >> > calling
> >> > > > it
> >> > > > > as
> >> > > > > > > > just
> >> > > > > > > > > > > > > 'update'.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 6:51 PM Jun Rao <
> >> > > > > j...@confluent.io>
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Hi, Kowshik,
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall. A
> few
> >> > > > comments
> >> > > > > > > below.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 100.
> >> UpdateFeaturesRequest/UpdateFeaturesResponse
> >> > > > > > > > > > > > > > > 100.1 Since this request waits for responses
> >> from
> >> > > > > > brokers,
> >> > > > > > > > > should
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > add
> >> > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > timeout in the request (like
> >> createTopicRequest)?
> >> > > > > > > > > > > > > > > 100.2 The response schema is a bit weird.
> >> > > Typically,
> >> > > > > the
> >> > > > > > > > > response
> >> > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > shows an error code and an error message,
> >> instead
> >> > > of
> >> > > > > > > echoing
> >> > > > > > > > > the
> >> > > > > > > > > > > > > request.
> >> > > > > > > > > > > > > > > 100.3 Should we add a separate request to
> >> > > > list/describe
> >> > > > > > the
> >> > > > > > > > > > > existing
> >> > > > > > > > > > > > > > > features?
> >> > > > > > > > > > > > > > > 100.4 We are mixing ADD_OR_UPDATE and DELETE
> >> in a
> >> > > > > single
> >> > > > > > > > > request.
> >> > > > > > > > > > > For
> >> > > > > > > > > > > > > > > DELETE, the version field doesn't make
> sense.
> >> > So, I
> >> > > > > guess
> >> > > > > > > the
> >> > > > > > > > > > > broker
> >> > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > ignores this? An alternative way is to have
> a
> >> > > > separate
> >> > > > > > > > > > > > > > > DeleteFeaturesRequest
> >> > > > > > > > > > > > > > > 100.5 In UpdateFeaturesResponse, we have
> "The
> >> > > > > > monotonically
> >> > > > > > > > > > > > increasing
> >> > > > > > > > > > > > > > > version of the metadata for finalized
> >> features."
> >> > I
> >> > > am
> >> > > > > > > > wondering
> >> > > > > > > > > > why
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > ordering is important?
> >> > > > > > > > > > > > > > > 100.6 Could you specify the required ACL for
> >> this
> >> > > new
> >> > > > > > > > request?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 101. For the broker registration ZK node,
> >> should
> >> > we
> >> > > > > bump
> >> > > > > > up
> >> > > > > > > > the
> >> > > > > > > > > > > > version
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > the json?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 102. For the /features ZK node, not sure if
> we
> >> > need
> >> > > > the
> >> > > > > > > epoch
> >> > > > > > > > > > > field.
> >> > > > > > > > > > > > > Each
> >> > > > > > > > > > > > > > > ZK node has an internal version field that
> is
> >> > > > > incremented
> >> > > > > > > on
> >> > > > > > > > > > every
> >> > > > > > > > > > > > > > update.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 103. "Enabling the actual semantics of a
> >> feature
> >> > > > > version
> >> > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > left to the discretion of the logic
> >> implementing
> >> > > the
> >> > > > > > > feature
> >> > > > > > > > > (ex:
> >> > > > > > > > > > > can
> >> > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > done via dynamic broker config)." Does that
> >> mean
> >> > > the
> >> > > > > > broker
> >> > > > > > > > > > > > > registration
> >> > > > > > > > > > > > > > ZK
> >> > > > > > > > > > > > > > > node will be updated dynamically when this
> >> > happens?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 104. UpdateMetadataRequest
> >> > > > > > > > > > > > > > > 104.1 It would be useful to describe when
> the
> >> > > feature
> >> > > > > > > > metadata
> >> > > > > > > > > is
> >> > > > > > > > > > > > > > included
> >> > > > > > > > > > > > > > > in the request. My understanding is that
> it's
> >> > only
> >> > > > > > included
> >> > > > > > > > if
> >> > > > > > > > > > (1)
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > a change to the finalized feature; (2)
> broker
> >> > > > restart;
> >> > > > > > (3)
> >> > > > > > > > > > > controller
> >> > > > > > > > > > > > > > > failover.
> >> > > > > > > > > > > > > > > 104.2 The new fields have the following
> >> versions.
> >> > > Why
> >> > > > > are
> >> > > > > > > the
> >> > > > > > > > > > > > versions
> >> > > > > > > > > > > > > 3+
> >> > > > > > > > > > > > > > > when the top version is bumped to 6?
> >> > > > > > > > > > > > > > >       "fields":  [
> >> > > > > > > > > > > > > > >         {"name": "Name", "type":  "string",
> >> > > > "versions":
> >> > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The name of the
> >> feature."},
> >> > > > > > > > > > > > > > >         {"name":  "Version", "type":
> "int64",
> >> > > > > > "versions":
> >> > > > > > > > > "3+",
> >> > > > > > > > > > > > > > >           "about": "The finalized version
> for
> >> the
> >> > > > > > > feature."}
> >> > > > > > > > > > > > > > >       ]
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > 105. kafka-features.sh: Instead of using
> >> > > > update/delete,
> >> > > > > > > > perhaps
> >> > > > > > > > > > > it's
> >> > > > > > > > > > > > > > better
> >> > > > > > > > > > > > > > > to use enable/disable?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Jun
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 5:29 PM Kowshik
> >> Prakasam
> >> > <
> >> > > > > > > > > > > > > kpraka...@confluent.io
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hey Boyang,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Thanks for the great feedback! I have
> >> updated
> >> > the
> >> > > > KIP
> >> > > > > > > based
> >> > > > > > > > > on
> >> > > > > > > > > > > your
> >> > > > > > > > > > > > > > > > feedback.
> >> > > > > > > > > > > > > > > > Please find my response below for your
> >> > comments,
> >> > > > look
> >> > > > > > for
> >> > > > > > > > > > > sentences
> >> > > > > > > > > > > > > > > > starting
> >> > > > > > > > > > > > > > > > with "(Kowshik)" below.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> >> begin
> >> > > > > handling
> >> > > > > > > EOS
> >> > > > > > > > > > > > traffic"
> >> > > > > > > > > > > > > > > could
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> >> brokers
> >> > > to
> >> > > > > > start
> >> > > > > > > > > > serving
> >> > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS
> is
> >> not
> >> > > > > > explained
> >> > > > > > > > > > earlier
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > context.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> >> metadata
> >> > > > > version
> >> > > > > > > > > number
> >> > > > > > > > > > > part
> >> > > > > > > > > > > > > > > seems a
> >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
> >> to
> >> > > later
> >> > > > > > > section
> >> > > > > > > > > > that
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > > going
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it
> every
> >> > time
> >> > > > when
> >> > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > change?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done. I've added a
> >> > > > reference
> >> > > > > in
> >> > > > > > > the
> >> > > > > > > > > > KIP.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> >> it's a
> >> > > > > > Non-goal
> >> > > > > > > of
> >> > > > > > > > > the
> >> > > > > > > > > > > > KIP,
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > features such as group coordinator
> >> semantics,
> >> > > > there
> >> > > > > > is
> >> > > > > > > no
> >> > > > > > > > > > legal
> >> > > > > > > > > > > > > > > scenario
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> >> > downgrade
> >> > > > > door
> >> > > > > > > open
> >> > > > > > > > > is
> >> > > > > > > > > > > > pretty
> >> > > > > > > > > > > > > > > > > error-prone as human faults happen all
> the
> >> > > time.
> >> > > > > I'm
> >> > > > > > > > > assuming
> >> > > > > > > > > > > as
> >> > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > features are implemented, it's not very
> >> hard
> >> > to
> >> > > > > add a
> >> > > > > > > > flag
> >> > > > > > > > > > > during
> >> > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > creation to indicate whether this
> feature
> >> is
> >> > > > > > > > > "downgradable".
> >> > > > > > > > > > > > Could
> >> > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > explain a bit more on the extra
> >> engineering
> >> > > > effort
> >> > > > > > for
> >> > > > > > > > > > shipping
> >> > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > > with downgrade protection in place?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! I'd agree and
> >> disagree
> >> > > > here.
> >> > > > > > > While
> >> > > > > > > > I
> >> > > > > > > > > > > agree
> >> > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > accidental
> >> > > > > > > > > > > > > > > > downgrades can cause problems, I also
> think
> >> > > > sometimes
> >> > > > > > > > > > downgrades
> >> > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > be allowed for emergency reasons (not all
> >> > > > downgrades
> >> > > > > > > cause
> >> > > > > > > > > > > issues).
> >> > > > > > > > > > > > > > > > It is just subjective to the feature being
> >> > > > > downgraded.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > To be more strict about feature version
> >> > > > downgrades, I
> >> > > > > > > have
> >> > > > > > > > > > > modified
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > proposing that we mandate a
> >> `--force-downgrade`
> >> > > > flag
> >> > > > > be
> >> > > > > > > > used
> >> > > > > > > > > in
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > UPDATE_FEATURES api
> >> > > > > > > > > > > > > > > > and the tooling, whenever the human is
> >> > > downgrading
> >> > > > a
> >> > > > > > > > > finalized
> >> > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > version.
> >> > > > > > > > > > > > > > > > Hopefully this should cover the
> requirement,
> >> > > until
> >> > > > we
> >> > > > > > > find
> >> > > > > > > > > the
> >> > > > > > > > > > > need
> >> > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > advanced downgrade support.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary
> of
> >> > > feature
> >> > > > > > > > versions
> >> > > > > > > > > > will
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > defined
> >> > > > > > > > > > > > > > > > > in the broker code." So this means in
> >> order
> >> > to
> >> > > > > > > restrict a
> >> > > > > > > > > > > certain
> >> > > > > > > > > > > > > > > > feature,
> >> > > > > > > > > > > > > > > > > we need to start the broker first and
> then
> >> > > send a
> >> > > > > > > feature
> >> > > > > > > > > > > gating
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> >> and
> >> > > the
> >> > > > > > > > > > > > intended-to-close
> >> > > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > could actually serve request during this
> >> > phase.
> >> > > > Do
> >> > > > > > you
> >> > > > > > > > > think
> >> > > > > > > > > > we
> >> > > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > also
> >> > > > > > > > > > > > > > > > > support configurations as well so that
> >> admin
> >> > > user
> >> > > > > > could
> >> > > > > > > > > > freely
> >> > > > > > > > > > > > roll
> >> > > > > > > > > > > > > > up
> >> > > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > > cluster with all nodes complying the
> same
> >> > > feature
> >> > > > > > > gating,
> >> > > > > > > > > > > without
> >> > > > > > > > > > > > > > > > worrying
> >> > > > > > > > > > > > > > > > > about the turnaround time to propagate
> the
> >> > > > message
> >> > > > > > only
> >> > > > > > > > > after
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > starts up?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): This is a great point/question.
> >> One
> >> > of
> >> > > > the
> >> > > > > > > > > > > expectations
> >> > > > > > > > > > > > > out
> >> > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > this KIP, which is
> >> > > > > > > > > > > > > > > > already followed in the broker, is the
> >> > following.
> >> > > > > > > > > > > > > > > >  - Imagine at time T1 the broker starts up
> >> and
> >> > > > > > registers
> >> > > > > > > > it’s
> >> > > > > > > > > > > > > presence
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > ZK,
> >> > > > > > > > > > > > > > > >    along with advertising it’s supported
> >> > > features.
> >> > > > > > > > > > > > > > > >  - Imagine at a future time T2 the broker
> >> > > receives
> >> > > > > the
> >> > > > > > > > > > > > > > > > UpdateMetadataRequest
> >> > > > > > > > > > > > > > > >    from the controller, which contains the
> >> > latest
> >> > > > > > > finalized
> >> > > > > > > > > > > > features
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > > > > seen by
> >> > > > > > > > > > > > > > > >    the controller. The broker validates
> this
> >> > data
> >> > > > > > against
> >> > > > > > > > > it’s
> >> > > > > > > > > > > > > > supported
> >> > > > > > > > > > > > > > > > features to
> >> > > > > > > > > > > > > > > >    make sure there is no mismatch (it will
> >> > > shutdown
> >> > > > > if
> >> > > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > > an
> >> > > > > > > > > > > > > > > > incompatibility).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > It is expected that during the time
> between
> >> > the 2
> >> > > > > > events
> >> > > > > > > T1
> >> > > > > > > > > and
> >> > > > > > > > > > > T2,
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > broker is
> >> > > > > > > > > > > > > > > > almost a silent entity in the cluster. It
> >> does
> >> > > not
> >> > > > > add
> >> > > > > > > any
> >> > > > > > > > > > value
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > cluster, or carry
> >> > > > > > > > > > > > > > > > out any important broker activities. By
> >> > > > “important”,
> >> > > > > I
> >> > > > > > > mean
> >> > > > > > > > > it
> >> > > > > > > > > > is
> >> > > > > > > > > > > > not
> >> > > > > > > > > > > > > > > doing
> >> > > > > > > > > > > > > > > > mutations
> >> > > > > > > > > > > > > > > > on it’s persistence, not mutating critical
> >> > > > in-memory
> >> > > > > > > state,
> >> > > > > > > > > > won’t
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > serving
> >> > > > > > > > > > > > > > > > produce/fetch requests. Note it doesn’t
> even
> >> > know
> >> > > > > it’s
> >> > > > > > > > > assigned
> >> > > > > > > > > > > > > > > partitions
> >> > > > > > > > > > > > > > > > until
> >> > > > > > > > > > > > > > > > it receives UpdateMetadataRequest from
> >> > > controller.
> >> > > > > > > Anything
> >> > > > > > > > > the
> >> > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > doing up
> >> > > > > > > > > > > > > > > > until this point is not damaging/useful.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I’ve clarified the above in the KIP, see
> >> this
> >> > new
> >> > > > > > > section:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Incompatiblebrokerlifetime
> >> > > > > > > > > > > > > > > > .
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> >> > deleting
> >> > > an
> >> > > > > > > > existing
> >> > > > > > > > > > > > > Feature",
> >> > > > > > > > > > > > > > > may
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> >> > > features
> >> > > > > are
> >> > > > > > > > > defined
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > > > code, so admin could not really create a
> >> new
> >> > > > > feature?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! You understood
> this
> >> > > right.
> >> > > > > Here
> >> > > > > > > > > adding
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > means we are
> >> > > > > > > > > > > > > > > > adding a cluster-wide finalized *max*
> >> version
> >> > > for a
> >> > > > > > > feature
> >> > > > > > > > > > that
> >> > > > > > > > > > > > was
> >> > > > > > > > > > > > > > > > previously never finalized.
> >> > > > > > > > > > > > > > > > I have clarified this in the KIP now.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> >> like
> >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > reject a concurrent feature update
> >> request.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! I have modified
> the
> >> KIP
> >> > > > > adding
> >> > > > > > > the
> >> > > > > > > > > > above
> >> > > > > > > > > > > > (see
> >> > > > > > > > > > > > > > > > 'Tooling support -> Admin API changes').
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> >> > alternative
> >> > > > > > > solution
> >> > > > > > > > to
> >> > > > > > > > > > > pass
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > feature information through Zookeeper.
> Is
> >> > that
> >> > > > > > > mentioned
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> >> > > > favorable?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Nice question! The broker reads
> >> > > > finalized
> >> > > > > > > > feature
> >> > > > > > > > > > info
> >> > > > > > > > > > > > > > stored
> >> > > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > ZK,
> >> > > > > > > > > > > > > > > > only during startup when it does a
> >> validation.
> >> > > When
> >> > > > > > > serving
> >> > > > > > > > > > > > > > > > `ApiVersionsRequest`, the
> >> > > > > > > > > > > > > > > > broker does not read this info from ZK
> >> > directly.
> >> > > > I'd
> >> > > > > > > > imagine
> >> > > > > > > > > > the
> >> > > > > > > > > > > > risk
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > that it can increase
> >> > > > > > > > > > > > > > > > the ZK read QPS which can be a bottleneck
> >> for
> >> > the
> >> > > > > > system.
> >> > > > > > > > > > Today,
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > Kafka
> >> > > > > > > > > > > > > > > > we use the
> >> > > > > > > > > > > > > > > > controller to fan out ZK updates to
> brokers
> >> and
> >> > > we
> >> > > > > want
> >> > > > > > > to
> >> > > > > > > > > > stick
> >> > > > > > > > > > > to
> >> > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > pattern to avoid
> >> > > > > > > > > > > > > > > > the ZK read bottleneck when serving
> >> > > > > > `ApiVersionsRequest`.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> >> could
> >> > > > > > > configure a
> >> > > > > > > > > > range
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> >> for
> >> > > > > allowing
> >> > > > > > > > > single
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > > version only?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great question! The finalized
> >> > version
> >> > > > of a
> >> > > > > > > > feature
> >> > > > > > > > > > > > > basically
> >> > > > > > > > > > > > > > > > refers to
> >> > > > > > > > > > > > > > > > the cluster-wide finalized feature
> "maximum"
> >> > > > version.
> >> > > > > > For
> >> > > > > > > > > > > example,
> >> > > > > > > > > > > > if
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > 'group_coordinator' feature
> >> > > > > > > > > > > > > > > > has the finalized version set to 10, then,
> >> it
> >> > > means
> >> > > > > > that
> >> > > > > > > > > > > > cluster-wide
> >> > > > > > > > > > > > > > all
> >> > > > > > > > > > > > > > > > versions upto v10 are
> >> > > > > > > > > > > > > > > > supported for this feature. However, note
> >> that
> >> > if
> >> > > > > some
> >> > > > > > > > > version
> >> > > > > > > > > > > (ex:
> >> > > > > > > > > > > > > v0)
> >> > > > > > > > > > > > > > > > gets deprecated
> >> > > > > > > > > > > > > > > > for this feature, then we don’t convey
> that
> >> > using
> >> > > > > this
> >> > > > > > > > scheme
> >> > > > > > > > > > > (also
> >> > > > > > > > > > > > > > > > supporting deprecation is a non-goal).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): I’ve now modified the KIP at
> all
> >> > > points,
> >> > > > > > > > refering
> >> > > > > > > > > to
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > feature "maximum" versions.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
> >> the
> >> > > > > "client"
> >> > > > > > > here
> >> > > > > > > > > may
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > producer
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > (Kowshik): Great point! Done.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Tue, Mar 31, 2020 at 1:17 PM Boyang
> Chen
> >> <
> >> > > > > > > > > > > > > > reluctanthero...@gmail.com>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Hey Kowshik,
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > thanks for the revised KIP. Got a couple
> >> of
> >> > > > > > questions:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 1. "When is it safe for the brokers to
> >> begin
> >> > > > > handling
> >> > > > > > > EOS
> >> > > > > > > > > > > > traffic"
> >> > > > > > > > > > > > > > > could
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > converted as "When is it safe for the
> >> brokers
> >> > > to
> >> > > > > > start
> >> > > > > > > > > > serving
> >> > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > Exactly-Once(EOS) semantics" since EOS
> is
> >> not
> >> > > > > > explained
> >> > > > > > > > > > earlier
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > context.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 2. In the *Explanation *section, the
> >> metadata
> >> > > > > version
> >> > > > > > > > > number
> >> > > > > > > > > > > part
> >> > > > > > > > > > > > > > > seems a
> >> > > > > > > > > > > > > > > > > bit blurred. Could you point a reference
> >> to
> >> > > later
> >> > > > > > > section
> >> > > > > > > > > > that
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > > going
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > store it in Zookeeper and update it
> every
> >> > time
> >> > > > when
> >> > > > > > > there
> >> > > > > > > > > is
> >> > > > > > > > > > a
> >> > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > change?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 3. For the feature downgrade, although
> >> it's a
> >> > > > > > Non-goal
> >> > > > > > > of
> >> > > > > > > > > the
> >> > > > > > > > > > > > KIP,
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > features such as group coordinator
> >> semantics,
> >> > > > there
> >> > > > > > is
> >> > > > > > > no
> >> > > > > > > > > > legal
> >> > > > > > > > > > > > > > > scenario
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > perform a downgrade at all. So having
> >> > downgrade
> >> > > > > door
> >> > > > > > > open
> >> > > > > > > > > is
> >> > > > > > > > > > > > pretty
> >> > > > > > > > > > > > > > > > > error-prone as human faults happen all
> the
> >> > > time.
> >> > > > > I'm
> >> > > > > > > > > assuming
> >> > > > > > > > > > > as
> >> > > > > > > > > > > > > new
> >> > > > > > > > > > > > > > > > > features are implemented, it's not very
> >> hard
> >> > to
> >> > > > > add a
> >> > > > > > > > flag
> >> > > > > > > > > > > during
> >> > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > creation to indicate whether this
> feature
> >> is
> >> > > > > > > > > "downgradable".
> >> > > > > > > > > > > > Could
> >> > > > > > > > > > > > > > you
> >> > > > > > > > > > > > > > > > > explain a bit more on the extra
> >> engineering
> >> > > > effort
> >> > > > > > for
> >> > > > > > > > > > shipping
> >> > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > > > > with downgrade protection in place?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 4. "Each broker’s supported dictionary
> of
> >> > > feature
> >> > > > > > > > versions
> >> > > > > > > > > > will
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > defined
> >> > > > > > > > > > > > > > > > > in the broker code." So this means in
> >> order
> >> > to
> >> > > > > > > restrict a
> >> > > > > > > > > > > certain
> >> > > > > > > > > > > > > > > > feature,
> >> > > > > > > > > > > > > > > > > we need to start the broker first and
> then
> >> > > send a
> >> > > > > > > feature
> >> > > > > > > > > > > gating
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > immediately, which introduces a time gap
> >> and
> >> > > the
> >> > > > > > > > > > > > intended-to-close
> >> > > > > > > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > > could actually serve request during this
> >> > phase.
> >> > > > Do
> >> > > > > > you
> >> > > > > > > > > think
> >> > > > > > > > > > we
> >> > > > > > > > > > > > > > should
> >> > > > > > > > > > > > > > > > also
> >> > > > > > > > > > > > > > > > > support configurations as well so that
> >> admin
> >> > > user
> >> > > > > > could
> >> > > > > > > > > > freely
> >> > > > > > > > > > > > roll
> >> > > > > > > > > > > > > > up
> >> > > > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > > cluster with all nodes complying the
> same
> >> > > feature
> >> > > > > > > gating,
> >> > > > > > > > > > > without
> >> > > > > > > > > > > > > > > > worrying
> >> > > > > > > > > > > > > > > > > about the turnaround time to propagate
> the
> >> > > > message
> >> > > > > > only
> >> > > > > > > > > after
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > starts up?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 5. "adding a new Feature, updating or
> >> > deleting
> >> > > an
> >> > > > > > > > existing
> >> > > > > > > > > > > > > Feature",
> >> > > > > > > > > > > > > > > may
> >> > > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > I misunderstood something, I thought the
> >> > > features
> >> > > > > are
> >> > > > > > > > > defined
> >> > > > > > > > > > > in
> >> > > > > > > > > > > > > > broker
> >> > > > > > > > > > > > > > > > > code, so admin could not really create a
> >> new
> >> > > > > feature?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 6. I think we need a separate error code
> >> like
> >> > > > > > > > > > > > > > > FEATURE_UPDATE_IN_PROGRESS
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > reject a concurrent feature update
> >> request.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 7. I think we haven't discussed the
> >> > alternative
> >> > > > > > > solution
> >> > > > > > > > to
> >> > > > > > > > > > > pass
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > feature information through Zookeeper.
> Is
> >> > that
> >> > > > > > > mentioned
> >> > > > > > > > in
> >> > > > > > > > > > the
> >> > > > > > > > > > > > KIP
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > justify why using UpdateMetadata is more
> >> > > > favorable?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 8. I was under the impression that user
> >> could
> >> > > > > > > configure a
> >> > > > > > > > > > range
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > supported versions, what's the trade-off
> >> for
> >> > > > > allowing
> >> > > > > > > > > single
> >> > > > > > > > > > > > > > finalized
> >> > > > > > > > > > > > > > > > > version only?
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 9. One minor syntax fix: Note that here
> >> the
> >> > > > > "client"
> >> > > > > > > here
> >> > > > > > > > > may
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > producer
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Boyang
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Mon, Mar 30, 2020 at 4:53 PM Colin
> >> McCabe
> >> > <
> >> > > > > > > > > > > cmcc...@apache.org
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Thu, Mar 26, 2020, at 19:24,
> Kowshik
> >> > > > Prakasam
> >> > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > Hi Colin,
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thanks for the feedback! I've
> changed
> >> the
> >> > > KIP
> >> > > > > to
> >> > > > > > > > > address
> >> > > > > > > > > > > your
> >> > > > > > > > > > > > > > > > > > > suggestions.
> >> > > > > > > > > > > > > > > > > > > Please find below my explanation.
> Here
> >> > is a
> >> > > > > link
> >> > > > > > to
> >> > > > > > > > KIP
> >> > > > > > > > > > > 584:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > > > > > > > > > > > > .
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 1. '__data_version__' is the version
> >> of
> >> > the
> >> > > > > > > finalized
> >> > > > > > > > > > > feature
> >> > > > > > > > > > > > > > > > metadata
> >> > > > > > > > > > > > > > > > > > > (i.e. actual ZK node contents),
> while
> >> the
> >> > > > > > > > > > > > '__schema_version__'
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > > version of the schema of the data
> >> > persisted
> >> > > > in
> >> > > > > > ZK.
> >> > > > > > > > > These
> >> > > > > > > > > > > > serve
> >> > > > > > > > > > > > > > > > > different
> >> > > > > > > > > > > > > > > > > > > purposes. '__data_version__' is is
> >> useful
> >> > > > > mainly
> >> > > > > > to
> >> > > > > > > > > > clients
> >> > > > > > > > > > > > > > during
> >> > > > > > > > > > > > > > > > > reads,
> >> > > > > > > > > > > > > > > > > > > to differentiate between the 2
> >> versions
> >> > of
> >> > > > > > > eventually
> >> > > > > > > > > > > > > consistent
> >> > > > > > > > > > > > > > > > > > 'finalized
> >> > > > > > > > > > > > > > > > > > > features' metadata (i.e. larger
> >> metadata
> >> > > > > version
> >> > > > > > is
> >> > > > > > > > > more
> >> > > > > > > > > > > > > recent).
> >> > > > > > > > > > > > > > > > > > > '__schema_version__' provides an
> >> > additional
> >> > > > > > degree
> >> > > > > > > of
> >> > > > > > > > > > > > > > flexibility,
> >> > > > > > > > > > > > > > > > > where
> >> > > > > > > > > > > > > > > > > > if
> >> > > > > > > > > > > > > > > > > > > we decide to change the schema for
> >> > > > '/features'
> >> > > > > > node
> >> > > > > > > > in
> >> > > > > > > > > ZK
> >> > > > > > > > > > > (in
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > future),
> >> > > > > > > > > > > > > > > > > > > then we can manage broker roll outs
> >> > > suitably
> >> > > > > > (i.e.
> >> > > > > > > > > > > > > > > > > > > serialization/deserialization of the
> >> ZK
> >> > > data
> >> > > > > can
> >> > > > > > be
> >> > > > > > > > > > handled
> >> > > > > > > > > > > > > > > safely).
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Hi Kowshik,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > If you're talking about a number that
> >> lets
> >> > > you
> >> > > > > know
> >> > > > > > > if
> >> > > > > > > > > data
> >> > > > > > > > > > > is
> >> > > > > > > > > > > > > more
> >> > > > > > > > > > > > > > > or
> >> > > > > > > > > > > > > > > > > > less recent, we would typically call
> >> that
> >> > an
> >> > > > > epoch,
> >> > > > > > > and
> >> > > > > > > > > > not a
> >> > > > > > > > > > > > > > > version.
> >> > > > > > > > > > > > > > > > > For
> >> > > > > > > > > > > > > > > > > > the ZK data structures, the word
> >> "version"
> >> > is
> >> > > > > > > typically
> >> > > > > > > > > > > > reserved
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > > > describing changes to the overall
> >> schema of
> >> > > the
> >> > > > > > data
> >> > > > > > > > that
> >> > > > > > > > > > is
> >> > > > > > > > > > > > > > written
> >> > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > ZooKeeper.  We don't even really
> change
> >> the
> >> > > > > > "version"
> >> > > > > > > > of
> >> > > > > > > > > > > those
> >> > > > > > > > > > > > > > > schemas
> >> > > > > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > > much, since most changes are
> >> > > > > backwards-compatible.
> >> > > > > > > But
> >> > > > > > > > > we
> >> > > > > > > > > > do
> >> > > > > > > > > > > > > > include
> >> > > > > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > > version field just in case.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I don't think we really need an epoch
> >> here,
> >> > > > > though,
> >> > > > > > > > since
> >> > > > > > > > > > we
> >> > > > > > > > > > > > can
> >> > > > > > > > > > > > > > just
> >> > > > > > > > > > > > > > > > > look
> >> > > > > > > > > > > > > > > > > > at the broker epoch.  Whenever the
> >> broker
> >> > > > > > registers,
> >> > > > > > > > its
> >> > > > > > > > > > > epoch
> >> > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > > > greater than the previous broker
> epoch.
> >> > And
> >> > > > the
> >> > > > > > > newly
> >> > > > > > > > > > > > registered
> >> > > > > > > > > > > > > > > data
> >> > > > > > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > > > > take priority.  This will be a lot
> >> simpler
> >> > > than
> >> > > > > > > adding
> >> > > > > > > > a
> >> > > > > > > > > > > > separate
> >> > > > > > > > > > > > > > > epoch
> >> > > > > > > > > > > > > > > > > > system, I think.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 2. Regarding admin client needing
> min
> >> and
> >> > > max
> >> > > > > > > > > > information -
> >> > > > > > > > > > > > you
> >> > > > > > > > > > > > > > are
> >> > > > > > > > > > > > > > > > > > right!
> >> > > > > > > > > > > > > > > > > > > I've changed the KIP such that the
> >> Admin
> >> > > API
> >> > > > > also
> >> > > > > > > > > allows
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > user
> >> > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > read
> >> > > > > > > > > > > > > > > > > > > 'supported features' from a specific
> >> > > broker.
> >> > > > > > Please
> >> > > > > > > > > look
> >> > > > > > > > > > at
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > section
> >> > > > > > > > > > > > > > > > > > > "Admin API changes".
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 3. Regarding the use of `long` vs
> >> `Long`
> >> > -
> >> > > it
> >> > > > > was
> >> > > > > > > not
> >> > > > > > > > > > > > > deliberate.
> >> > > > > > > > > > > > > > > > I've
> >> > > > > > > > > > > > > > > > > > > improved the KIP to just use `long`
> at
> >> > all
> >> > > > > > places.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Sounds good.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > 4. Regarding
> >> kafka.admin.FeatureCommand
> >> > > tool
> >> > > > -
> >> > > > > > you
> >> > > > > > > > are
> >> > > > > > > > > > > right!
> >> > > > > > > > > > > > > > I've
> >> > > > > > > > > > > > > > > > > > updated
> >> > > > > > > > > > > > > > > > > > > the KIP sketching the functionality
> >> > > provided
> >> > > > by
> >> > > > > > > this
> >> > > > > > > > > > tool,
> >> > > > > > > > > > > > with
> >> > > > > > > > > > > > > > > some
> >> > > > > > > > > > > > > > > > > > > examples. Please look at the section
> >> > > "Tooling
> >> > > > > > > support
> >> > > > > > > > > > > > > examples".
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thank you!
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks, Kowshik.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > cheers,
> >> > > > > > > > > > > > > > > > > > Colin
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > On Wed, Mar 25, 2020 at 11:31 PM
> Colin
> >> > > > McCabe <
> >> > > > > > > > > > > > > > cmcc...@apache.org>
> >> > > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Thanks, Kowshik, this looks good.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > In the "Schema" section, do we
> >> really
> >> > > need
> >> > > > > both
> >> > > > > > > > > > > > > > > __schema_version__
> >> > > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > > > > > > __data_version__?  Can we just
> have
> >> a
> >> > > > single
> >> > > > > > > > version
> >> > > > > > > > > > > field
> >> > > > > > > > > > > > > > here?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > Shouldn't the Admin(Client)
> function
> >> > have
> >> > > > > some
> >> > > > > > > way
> >> > > > > > > > to
> >> > > > > > > > > > get
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > min
> >> > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > > > > max
> >> > > > > > > > > > > > > > > > > > > > information that we're exposing as
> >> > > well?  I
> >> > > > > > guess
> >> > > > > > > > we
> >> > > > > > > > > > > could
> >> > > > > > > > > > > > > have
> >> > > > > > > > > > > > > > > > min,
> >> > > > > > > > > > > > > > > > > > max,
> >> > > > > > > > > > > > > > > > > > > > and current.  Unrelated: is the
> use
> >> of
> >> > > Long
> >> > > > > > > rather
> >> > > > > > > > > than
> >> > > > > > > > > > > > long
> >> > > > > > > > > > > > > > > > > deliberate
> >> > > > > > > > > > > > > > > > > > > > here?
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > It would be good to describe how
> the
> >> > > > command
> >> > > > > > line
> >> > > > > > > > > tool
> >> > > > > > > > > > > > > > > > > > > > kafka.admin.FeatureCommand will
> >> work.
> >> > > For
> >> > > > > > > example
> >> > > > > > > > > the
> >> > > > > > > > > > > > flags
> >> > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > > > > > will
> >> > > > > > > > > > > > > > > > > > > > take and the output that it will
> >> > generate
> >> > > > to
> >> > > > > > > > STDOUT.
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > cheers,
> >> > > > > > > > > > > > > > > > > > > > Colin
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > On Tue, Mar 24, 2020, at 17:08,
> >> Kowshik
> >> > > > > > Prakasam
> >> > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > > > > > Hi all,
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > I've opened KIP-584
> >> > > > > > > > > > > > > > > <
> >> https://issues.apache.org/jira/browse/KIP-584>
> >> > <
> >> > > > > > > > > > > > > > > >
> >> https://issues.apache.org/jira/browse/KIP-584
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > which
> >> > > > > > > > > > > > > > > > > > > > > is intended to provide a
> >> versioning
> >> > > > scheme
> >> > > > > > for
> >> > > > > > > > > > > features.
> >> > > > > > > > > > > > > I'd
> >> > > > > > > > > > > > > > > like
> >> > > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > use
> >> > > > > > > > > > > > > > > > > > > > > this thread to discuss the same.
> >> I'd
> >> > > > > > appreciate
> >> > > > > > > > any
> >> > > > > > > > > > > > > feedback
> >> > > > > > > > > > > > > > on
> >> > > > > > > > > > > > > > > > > this.
> >> > > > > > > > > > > > > > > > > > > > > Here
> >> > > > > > > > > > > > > > > > > > > > > is a link to KIP-584
> >> > > > > > > > > > > > > > > <
> >> https://issues.apache.org/jira/browse/KIP-584>:
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features
> >> > > > > > > > > > > > > > > > > > > > >  .
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Thank you!
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > > > Cheers,
> >> > > > > > > > > > > > > > > > > > > > > Kowshik
> >> > > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to