Hi Grant,

Thanks for the write-ups. I had some comments besides the debate of
metadata request principles:

1. One minor thing: "Note there is a special use case - automatic topic
creation for TopicMetadataRequest, to trigger it user should set
client_id=consumer and define only topic name:"

I think the current behavior is that as long as the config is turned on,
any MetadataRequest for specific topics that do not exist will trigger
creating it, no matter of the client ids?

2. One general comment: about the ideal v.s. intermediate blocking
behavior, my understanding is that:

a. for intermediate solution, which will be included in this KIP to be
fully implemented, we will require clients to send admin requests ONLY to
the controller, and users need to use the new version of MetadataRequest
polling on all brokers if they want to block on the whole cluster being
updated consistently.

b. for the ideal solution, admin requests can be sent to any brokers. But
we will choose to either 1) let the request to be blocked on ONLY
controller metadata updated, and hence users still need to use
MetadataRequest polling and blocking for the whole cluster being updated
consistently; or 2) let the request to be blocked on ALL brokers' metadata
updated, and hence no need to use MetadataRequest polling any more. We
cannot support both of these functionalities as they will be in different
admin requests protocol versions. Is that the case?

Personally I would prefer the second case since it is simpler to reason and
develop against for clients, if they ever like to specify "block until it
is finished". I.e. it is either "don't care I will just retry if follow up
request fails" or "please make sure I will never have unexpected transient
errors, that I would rather block for all". But I could be wrong
anticipating users' needs.

In addition, I feel that changing the behavior from "only controller" to
"any broker" over time could be confusing to people as well, since that
involves some error code mappings change, etc. So if we are going to do
"only controller" now, I'd rather we stick with it in the future or really
thinking through about how we would educate client developers to possibly
upgrade their code when we change it to "any broker".

Guozhang


On Sun, Apr 3, 2016 at 5:47 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Jun,
>
> Yes, that makes it a bit challenging to do this in a manner that is fully
> compatible. Having said that, most (all?) languages have Perl-derived
> regexes (including Java), so it seems like it should be possible to define
> a subset for the protocol. But perhaps that's a bigger change that requires
> its own KIP.
>
> Ismael
>
> On Mon, Apr 4, 2016 at 1:04 AM, Jun Rao <j...@confluent.io> wrote:
>
> > Ismael,
> >
> > The reason that we only apply the regex for topics on the client side is
> > that regex is java specific. So, for non-java clients, it would be a bit
> > weird for them to include java specific stuff in the wire protocol.
> >
> > Thanks,
> >
> > Jun
> >
> > On Sun, Apr 3, 2016 at 12:36 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Grant,
> > >
> > > One question that occurred to me is whether we want to take the chance
> to
> > > make it possible to pass a regex pattern for the desired topics in the
> > > metadata request. This would potentially improve the efficiency of
> > > `KafkaConsumer.subscribe` significantly for cases where there are large
> > > number of topics and partitions (we currently get the metadata for
> _all_
> > > topics and apply the regular expression on the client).
> > >
> > > Ismael
> > >
> > > On Fri, Apr 1, 2016 at 3:29 AM, Grant Henke <ghe...@cloudera.com>
> wrote:
> > >
> > > > I have update the wiki and patch based on the feedback on the
> metadata
> > > > changes. Please take a look and let me know if there are any concerns
> > or
> > > > issues.
> > > >
> > > >
> > > >    - Wiki:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> > > >    - PR: https://github.com/apache/kafka/pull/1095
> > > >
> > > > I will hold a vote on the metadata change soon if no major issues are
> > > > raised.
> > > >
> > > > On Thu, Mar 31, 2016 at 6:33 PM, Jason Gustafson <ja...@confluent.io
> >
> > > > wrote:
> > > >
> > > > > I also prefer B. In addition to being intuitive, it seems less
> > > > error-prone
> > > > > in the long term, though it might be a little annoying for clients
> > > > > maintaining backwards compatibility.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Thu, Mar 31, 2016 at 1:24 PM, Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > I prefer B, the fact that we version the protocol means that we
> can
> > > fix
> > > > > > mistakes instead of living with them forever. We should take
> > > advantage
> > > > of
> > > > > > that.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Mar 31, 2016 at 9:15 PM, Grant Henke <
> ghe...@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > > > Looking for some resolution on the "No Topics" change.
> > > > > > >
> > > > > > > I am thinking that using null in the protocol isn't that
> complex,
> > > and
> > > > > it
> > > > > > > avoids various edge cases with having multiple fields. That
> > leaves
> > > us
> > > > > > with
> > > > > > > 2 options:
> > > > > > >
> > > > > > >    - A: null = no topics, empty = all topics
> > > > > > >    - B: null = all topics, empty = no topics
> > > > > > >
> > > > > > > A is nice because it just adds new functionality, existing
> logic
> > > > > doesn't
> > > > > > > change
> > > > > > > B is nice because its more "intuitive", but has the drawback of
> > > > > changing
> > > > > > > what empty means from request v0
> > > > > > >
> > > > > > > I do not have a strong opinion on the approach taken, which
> makes
> > > me
> > > > > lean
> > > > > > > towards option A. Keep in mind at the user level, the apis in
> the
> > > > > various
> > > > > > > clients can map this however they like.
> > > > > > >
> > > > > > > Does anyone feel strongly about the choice?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 31, 2016 at 9:21 AM, Grant Henke <
> > ghe...@cloudera.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I had a second look at the proposed changes to Metadata
> Request
> > > and
> > > > > > > >> Response and it seems to me that having a `controller_id`
> > field
> > > > > would
> > > > > > be
> > > > > > > >> more efficient for non-trivial cases than having a
> > > `is_controller`
> > > > > > field
> > > > > > > >>  for each broker (which would be false for all but 1 case).
> > > > > > > >
> > > > > > > >
> > > > > > > > I agree this is better. I will update it.
> > > > > > > >
> > > > > > > > Similar, but less clear is the best way to encode
> > > > > `marked_for_deletion`
> > > > > > > and
> > > > > > > >> `is_internal`. These will also be false for most topics
> (there
> > > is
> > > > > only
> > > > > > > one
> > > > > > > >> internal topic at the moment, for example), so it may make
> > sense
> > > > to
> > > > > > > have a
> > > > > > > >> topics_marked_for_deletion and internal_topics in the
> > response.
> > > > > > Because
> > > > > > > >> topics are identified by strings, it is not as clear-cut as
> > the
> > > > > > > >> controller_id case, but it still seems like it would be a
> win
> > > for
> > > > > when
> > > > > > > it
> > > > > > > >> matters most (when the number of topics is large).
> > > > > > > >>
> > > > > > > >
> > > > > > > > Thats an interesting idea. I can try making this change to
> see
> > > what
> > > > > it
> > > > > > > > would look like.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Grant
> > > > > > > >
> > > > > > > > On Thu, Mar 31, 2016 at 8:59 AM, Ismael Juma <
> > ism...@juma.me.uk>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Grant,
> > > > > > > >>
> > > > > > > >> I had a second look at the proposed changes to Metadata
> > Request
> > > > and
> > > > > > > >> Response and it seems to me that having a `controller_id`
> > field
> > > > > would
> > > > > > be
> > > > > > > >> more efficient for non-trivial cases than having a
> > > `is_controller`
> > > > > > field
> > > > > > > >>  for each broker (which would be false for all but 1 case).
> > > > > > > >>
> > > > > > > >> Similar, but less clear is the best way to encode
> > > > > > `marked_for_deletion`
> > > > > > > >> and
> > > > > > > >> `is_internal`. These will also be false for most topics
> (there
> > > is
> > > > > only
> > > > > > > one
> > > > > > > >> internal topic at the moment, for example), so it may make
> > sense
> > > > to
> > > > > > > have a
> > > > > > > >> topics_marked_for_deletion and internal_topics in the
> > response.
> > > > > > Because
> > > > > > > >> topics are identified by strings, it is not as clear-cut as
> > the
> > > > > > > >> controller_id case, but it still seems like it would be a
> win
> > > for
> > > > > when
> > > > > > > it
> > > > > > > >> matters most (when the number of topics is large).
> > > > > > > >
> > > > > > > >
> > > > > > > >> Ismael
> > > > > > > >>
> > > > > > > >> On Mon, Mar 14, 2016 at 10:07 PM, Grant Henke <
> > > > ghe...@cloudera.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > I have been updating the KIP-4 wiki page based on the last
> > KIP
> > > > > call
> > > > > > > and
> > > > > > > >> > wanted to get some review and discussion around the server
> > > side
> > > > > > > >> > implementation for admin requests. Both the "ideal"
> > > > functionality
> > > > > > and
> > > > > > > >> the
> > > > > > > >> > "intermediated" functionality. The updates are still in
> > > > progress,
> > > > > > but
> > > > > > > >> this
> > > > > > > >> > section is the most critical and will likely have the most
> > > > > > discussion.
> > > > > > > >> This
> > > > > > > >> > topic has had a few shifts in perspective and various
> > > > discussions
> > > > > on
> > > > > > > >> > synchronous vs asynchronous server support. The wiki
> > contains
> > > my
> > > > > > > current
> > > > > > > >> > perspective on the challenges and approach.
> > > > > > > >> >
> > > > > > > >> > If you have any thoughts or feedback on the "Server-side
> > Admin
> > > > > > Request
> > > > > > > >> > handlers" section here
> > > > > > > >> > <
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers
> > > > > > > >> > >.
> > > > > > > >> > Lets discuss them in this thread.
> > > > > > > >> >
> > > > > > > >> > For reference the last KIP discussion can be viewed here:
> > > > > > > >> > https://youtu.be/rFW0-zJqg5I?t=12m30s
> > > > > > > >> >
> > > > > > > >> > Thank you,
> > > > > > > >> > Grant
> > > > > > > >> > --
> > > > > > > >> > Grant Henke
> > > > > > > >> > Software Engineer | Cloudera
> > > > > > > >> > gr...@cloudera.com | twitter.com/gchenke |
> > > > > > linkedin.com/in/granthenke
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Grant Henke
> > > > > > > > Software Engineer | Cloudera
> > > > > > > > gr...@cloudera.com | twitter.com/gchenke |
> > > > > linkedin.com/in/granthenke
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Grant Henke
> > > > > > > Software Engineer | Cloudera
> > > > > > > gr...@cloudera.com | twitter.com/gchenke |
> > > > linkedin.com/in/granthenke
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Grant Henke
> > > > Software Engineer | Cloudera
> > > > gr...@cloudera.com | twitter.com/gchenke |
> linkedin.com/in/granthenke
> > > >
> > >
> >
>



-- 
-- Guozhang

Reply via email to