About the "only controller" to "any broker" over time, I think I buy all
these points. Since this is out of the scope of this KIP, and since as
Grant said, this would be a rather general feature, but not only restricted
to the added admin requests in this KIP, we can leave that discussion for
later.

Guozhang

On Mon, Apr 4, 2016 at 10:27 AM, Grant Henke <ghe...@cloudera.com> wrote:

> Thanks for the review Guozhang, see my responses below.
>
> 1. One minor thing: "Note there is a special use case - automatic topic
> > creation for TopicMetadataRequest, to trigger it user should set
> > client_id=consumer and define only topic name:"
>
>
> > I think the current behavior is that as long as the config is turned on,
> > any MetadataRequest for specific topics that do not exist will trigger
> > creating it, no matter of the client ids?
>
>
> This is left over from the original proposal. I am not changing any
> auto-create behavior. I will remove that and look for other cleanup needed.
>
> 2. One general comment: about the ideal v.s. intermediate blocking
> > behavior, my understanding is that:
> >
> > a. for intermediate solution, which will be included in this KIP to be
> > fully implemented, we will require clients to send admin requests ONLY to
> > the controller, and users need to use the new version of MetadataRequest
> > polling on all brokers if they want to block on the whole cluster being
> > updated consistently.
>
>
> This is correct. Admin *write* requests must go to *the controller* broker.
> *Read* requests can still go to *any* broker. I updated the wiki to make
> this more explicit. This would be handled by the Admin Client.
>
> b. for the ideal solution, admin requests can be sent to any brokers. But
> > we will choose to either 1) let the request to be blocked on ONLY
> > controller metadata updated, and hence users still need to use
> > MetadataRequest polling and blocking for the whole cluster being updated
> > consistently; or 2) let the request to be blocked on ALL brokers'
> metadata
> > updated, and hence no need to use MetadataRequest polling any more. We
> > cannot support both of these functionalities as they will be in different
> > admin requests protocol versions. Is that the case?
>
>
> Yes the ideal solution would move from local blocking to blocking until the
> entire cluster is consistent. This means you would not need to poll
> assuming there are no timeouts. However, you may want to poll later if
> there is a timeout. This behavior change would require a protocol version
> bump.
>
> In addition, I feel that changing the behavior from "only controller" to
> > "any broker" over time could be confusing to people as well, since that
> > involves some error code mappings change, etc. So if we are going to do
> > "only controller" now, I'd rather we stick with it in the future or
> really
> > thinking through about how we would educate client developers to possibly
> > upgrade their code when we change it to "any broker".
>
>
> Only controller vs any broker is a convenience feature. It has been
> suggested for other requests and therefore came up during this discussion
> of KIP-4. I would prefer to discuss message forwarding in a separate KIP
> (which I am happy to help drive after KIP-4). I have laid out some reasons
> in the wiki but will also elaborate here.
>
> I would prefer messages be required to be forwarded to directly to the
> controller because:
>
>    1. KIP-4 is already very large and this introduces more scope
>       - Broker to Broker message forwarding introduces new failure and
>       security scenarios that need to be designed and thought through.
>       2. Client routing messages is likely an optimization many full
>    featured clients would like to implement regardless
>    3. Routing messages to the correct broker is very common in clients
>    today and we already have retriable errors and patterns defined for
>    handling outdated routing
>       - Partition Leader Broker
>       - Group Coordinator Broker
>    4. Message forwarding can be handled generically for any request that
>    needs forwarding
>    5. If/when message forwarding is added in the future. Clients that
>    already route to the controller would not need to change. New clients,
> or
>    clients that do not want to route to the correct broker can send
> messages
>    to any broker.
>
> I am also okay with never supporting message forwarding if you think its
> too confusing to clients, though it appears to be a highly requested
> feature. Even before KIP-4. See KAFKA-1912
> <https://issues.apache.org/jira/browse/KAFKA-1912>.
>
> Thank you,
> Grant
>
>
> On Mon, Apr 4, 2016 at 12:38 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Grant,
> >
> > Thanks for the write-ups. I had some comments besides the debate of
> > metadata request principles:
> >
> > 1. One minor thing: "Note there is a special use case - automatic topic
> > creation for TopicMetadataRequest, to trigger it user should set
> > client_id=consumer and define only topic name:"
> >
> > I think the current behavior is that as long as the config is turned on,
> > any MetadataRequest for specific topics that do not exist will trigger
> > creating it, no matter of the client ids?
> >
> > 2. One general comment: about the ideal v.s. intermediate blocking
> > behavior, my understanding is that:
> >
> > a. for intermediate solution, which will be included in this KIP to be
> > fully implemented, we will require clients to send admin requests ONLY to
> > the controller, and users need to use the new version of MetadataRequest
> > polling on all brokers if they want to block on the whole cluster being
> > updated consistently.
> >
> > b. for the ideal solution, admin requests can be sent to any brokers. But
> > we will choose to either 1) let the request to be blocked on ONLY
> > controller metadata updated, and hence users still need to use
> > MetadataRequest polling and blocking for the whole cluster being updated
> > consistently; or 2) let the request to be blocked on ALL brokers'
> metadata
> > updated, and hence no need to use MetadataRequest polling any more. We
> > cannot support both of these functionalities as they will be in different
> > admin requests protocol versions. Is that the case?
> >
> > Personally I would prefer the second case since it is simpler to reason
> and
> > develop against for clients, if they ever like to specify "block until it
> > is finished". I.e. it is either "don't care I will just retry if follow
> up
> > request fails" or "please make sure I will never have unexpected
> transient
> > errors, that I would rather block for all". But I could be wrong
> > anticipating users' needs.
> >
> > In addition, I feel that changing the behavior from "only controller" to
> > "any broker" over time could be confusing to people as well, since that
> > involves some error code mappings change, etc. So if we are going to do
> > "only controller" now, I'd rather we stick with it in the future or
> really
> > thinking through about how we would educate client developers to possibly
> > upgrade their code when we change it to "any broker".
> >
> > Guozhang
> >
> >
> > On Sun, Apr 3, 2016 at 5:47 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Jun,
> > >
> > > Yes, that makes it a bit challenging to do this in a manner that is
> fully
> > > compatible. Having said that, most (all?) languages have Perl-derived
> > > regexes (including Java), so it seems like it should be possible to
> > define
> > > a subset for the protocol. But perhaps that's a bigger change that
> > requires
> > > its own KIP.
> > >
> > > Ismael
> > >
> > > On Mon, Apr 4, 2016 at 1:04 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Ismael,
> > > >
> > > > The reason that we only apply the regex for topics on the client side
> > is
> > > > that regex is java specific. So, for non-java clients, it would be a
> > bit
> > > > weird for them to include java specific stuff in the wire protocol.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Sun, Apr 3, 2016 at 12:36 AM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > >
> > > > > Hi Grant,
> > > > >
> > > > > One question that occurred to me is whether we want to take the
> > chance
> > > to
> > > > > make it possible to pass a regex pattern for the desired topics in
> > the
> > > > > metadata request. This would potentially improve the efficiency of
> > > > > `KafkaConsumer.subscribe` significantly for cases where there are
> > large
> > > > > number of topics and partitions (we currently get the metadata for
> > > _all_
> > > > > topics and apply the regular expression on the client).
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Fri, Apr 1, 2016 at 3:29 AM, Grant Henke <ghe...@cloudera.com>
> > > wrote:
> > > > >
> > > > > > I have update the wiki and patch based on the feedback on the
> > > metadata
> > > > > > changes. Please take a look and let me know if there are any
> > concerns
> > > > or
> > > > > > issues.
> > > > > >
> > > > > >
> > > > > >    - Wiki:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> > > > > >    - PR: https://github.com/apache/kafka/pull/1095
> > > > > >
> > > > > > I will hold a vote on the metadata change soon if no major issues
> > are
> > > > > > raised.
> > > > > >
> > > > > > On Thu, Mar 31, 2016 at 6:33 PM, Jason Gustafson <
> > ja...@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I also prefer B. In addition to being intuitive, it seems less
> > > > > > error-prone
> > > > > > > in the long term, though it might be a little annoying for
> > clients
> > > > > > > maintaining backwards compatibility.
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Thu, Mar 31, 2016 at 1:24 PM, Ismael Juma <
> ism...@juma.me.uk>
> > > > > wrote:
> > > > > > >
> > > > > > > > I prefer B, the fact that we version the protocol means that
> we
> > > can
> > > > > fix
> > > > > > > > mistakes instead of living with them forever. We should take
> > > > > advantage
> > > > > > of
> > > > > > > > that.
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Thu, Mar 31, 2016 at 9:15 PM, Grant Henke <
> > > ghe...@cloudera.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Looking for some resolution on the "No Topics" change.
> > > > > > > > >
> > > > > > > > > I am thinking that using null in the protocol isn't that
> > > complex,
> > > > > and
> > > > > > > it
> > > > > > > > > avoids various edge cases with having multiple fields. That
> > > > leaves
> > > > > us
> > > > > > > > with
> > > > > > > > > 2 options:
> > > > > > > > >
> > > > > > > > >    - A: null = no topics, empty = all topics
> > > > > > > > >    - B: null = all topics, empty = no topics
> > > > > > > > >
> > > > > > > > > A is nice because it just adds new functionality, existing
> > > logic
> > > > > > > doesn't
> > > > > > > > > change
> > > > > > > > > B is nice because its more "intuitive", but has the
> drawback
> > of
> > > > > > > changing
> > > > > > > > > what empty means from request v0
> > > > > > > > >
> > > > > > > > > I do not have a strong opinion on the approach taken, which
> > > makes
> > > > > me
> > > > > > > lean
> > > > > > > > > towards option A. Keep in mind at the user level, the apis
> in
> > > the
> > > > > > > various
> > > > > > > > > clients can map this however they like.
> > > > > > > > >
> > > > > > > > > Does anyone feel strongly about the choice?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Mar 31, 2016 at 9:21 AM, Grant Henke <
> > > > ghe...@cloudera.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I had a second look at the proposed changes to Metadata
> > > Request
> > > > > and
> > > > > > > > > >> Response and it seems to me that having a
> `controller_id`
> > > > field
> > > > > > > would
> > > > > > > > be
> > > > > > > > > >> more efficient for non-trivial cases than having a
> > > > > `is_controller`
> > > > > > > > field
> > > > > > > > > >>  for each broker (which would be false for all but 1
> > case).
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I agree this is better. I will update it.
> > > > > > > > > >
> > > > > > > > > > Similar, but less clear is the best way to encode
> > > > > > > `marked_for_deletion`
> > > > > > > > > and
> > > > > > > > > >> `is_internal`. These will also be false for most topics
> > > (there
> > > > > is
> > > > > > > only
> > > > > > > > > one
> > > > > > > > > >> internal topic at the moment, for example), so it may
> make
> > > > sense
> > > > > > to
> > > > > > > > > have a
> > > > > > > > > >> topics_marked_for_deletion and internal_topics in the
> > > > response.
> > > > > > > > Because
> > > > > > > > > >> topics are identified by strings, it is not as clear-cut
> > as
> > > > the
> > > > > > > > > >> controller_id case, but it still seems like it would be
> a
> > > win
> > > > > for
> > > > > > > when
> > > > > > > > > it
> > > > > > > > > >> matters most (when the number of topics is large).
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > Thats an interesting idea. I can try making this change
> to
> > > see
> > > > > what
> > > > > > > it
> > > > > > > > > > would look like.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Grant
> > > > > > > > > >
> > > > > > > > > > On Thu, Mar 31, 2016 at 8:59 AM, Ismael Juma <
> > > > ism...@juma.me.uk>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Grant,
> > > > > > > > > >>
> > > > > > > > > >> I had a second look at the proposed changes to Metadata
> > > > Request
> > > > > > and
> > > > > > > > > >> Response and it seems to me that having a
> `controller_id`
> > > > field
> > > > > > > would
> > > > > > > > be
> > > > > > > > > >> more efficient for non-trivial cases than having a
> > > > > `is_controller`
> > > > > > > > field
> > > > > > > > > >>  for each broker (which would be false for all but 1
> > case).
> > > > > > > > > >>
> > > > > > > > > >> Similar, but less clear is the best way to encode
> > > > > > > > `marked_for_deletion`
> > > > > > > > > >> and
> > > > > > > > > >> `is_internal`. These will also be false for most topics
> > > (there
> > > > > is
> > > > > > > only
> > > > > > > > > one
> > > > > > > > > >> internal topic at the moment, for example), so it may
> make
> > > > sense
> > > > > > to
> > > > > > > > > have a
> > > > > > > > > >> topics_marked_for_deletion and internal_topics in the
> > > > response.
> > > > > > > > Because
> > > > > > > > > >> topics are identified by strings, it is not as clear-cut
> > as
> > > > the
> > > > > > > > > >> controller_id case, but it still seems like it would be
> a
> > > win
> > > > > for
> > > > > > > when
> > > > > > > > > it
> > > > > > > > > >> matters most (when the number of topics is large).
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> Ismael
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Mar 14, 2016 at 10:07 PM, Grant Henke <
> > > > > > ghe...@cloudera.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > I have been updating the KIP-4 wiki page based on the
> > last
> > > > KIP
> > > > > > > call
> > > > > > > > > and
> > > > > > > > > >> > wanted to get some review and discussion around the
> > server
> > > > > side
> > > > > > > > > >> > implementation for admin requests. Both the "ideal"
> > > > > > functionality
> > > > > > > > and
> > > > > > > > > >> the
> > > > > > > > > >> > "intermediated" functionality. The updates are still
> in
> > > > > > progress,
> > > > > > > > but
> > > > > > > > > >> this
> > > > > > > > > >> > section is the most critical and will likely have the
> > most
> > > > > > > > discussion.
> > > > > > > > > >> This
> > > > > > > > > >> > topic has had a few shifts in perspective and various
> > > > > > discussions
> > > > > > > on
> > > > > > > > > >> > synchronous vs asynchronous server support. The wiki
> > > > contains
> > > > > my
> > > > > > > > > current
> > > > > > > > > >> > perspective on the challenges and approach.
> > > > > > > > > >> >
> > > > > > > > > >> > If you have any thoughts or feedback on the
> "Server-side
> > > > Admin
> > > > > > > > Request
> > > > > > > > > >> > handlers" section here
> > > > > > > > > >> > <
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-2.Server-sideAdminRequesthandlers
> > > > > > > > > >> > >.
> > > > > > > > > >> > Lets discuss them in this thread.
> > > > > > > > > >> >
> > > > > > > > > >> > For reference the last KIP discussion can be viewed
> > here:
> > > > > > > > > >> > https://youtu.be/rFW0-zJqg5I?t=12m30s
> > > > > > > > > >> >
> > > > > > > > > >> > Thank you,
> > > > > > > > > >> > Grant
> > > > > > > > > >> > --
> > > > > > > > > >> > Grant Henke
> > > > > > > > > >> > Software Engineer | Cloudera
> > > > > > > > > >> > gr...@cloudera.com | twitter.com/gchenke |
> > > > > > > > linkedin.com/in/granthenke
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Grant Henke
> > > > > > > > > > Software Engineer | Cloudera
> > > > > > > > > > gr...@cloudera.com | twitter.com/gchenke |
> > > > > > > linkedin.com/in/granthenke
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Grant Henke
> > > > > > > > > Software Engineer | Cloudera
> > > > > > > > > gr...@cloudera.com | twitter.com/gchenke |
> > > > > > linkedin.com/in/granthenke
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Grant Henke
> > > > > > Software Engineer | Cloudera
> > > > > > gr...@cloudera.com | twitter.com/gchenke |
> > > linkedin.com/in/granthenke
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
-- Guozhang

Reply via email to