Re: [KIP-DISCUSSION] KIP-13 Quotas

Gwen Shapira Tue, 07 Apr 2015 09:56:13 -0700

Re (1):
We have no authorization story on the metrics collected by brokers, so I
assume that access to broker metrics means knowing exactly which topics
exist and their throughputs. (Prath and Don, correct me if I got it
wrong...)
Secure environments will strictly control access to this information, so I
am pretty sure the client developers will not have access to server metrics
at all.


Gwen

On Tue, Apr 7, 2015 at 7:41 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Totally. But is that the only use? What I wanted to flesh out was whether
> the goal was:
> 1. Expose throttling in the client metrics
> 2. Enable programmatic response (i.e. stop sending stuff or something like
> that)
>
> I think I kind of understand (1) but let's get specific on the metric we
> would be adding and what exactly you would expose  in a dashboard. For
> example if the goal is just monitoring do I really want a boolean flag for
> is_throttled or do I want to know how much I am being throttled (i.e.
> throttle_pct might indicate the percent of your request time that was due
> to throttling or something like that)? If I am 1% throttled that may be
> irrelevant but 99% throttled would be quite relevant? Not sure I agree,
> just throwing that out there...
>
> For (2) the prior discussion seemed to kind of allude to this but I can't
> really come up with a use case. Is there one?
>
> If it is just (1) I think the question is whether it really helps much to
> have the metric on the client vs the server. I suppose this is a bit
> environment specific. If you have a central metrics system it shouldn't
> make any difference, but if you don't I suppose it does.
>
> -Jay
>
> On Mon, Apr 6, 2015 at 7:57 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
> > Here's a wild guess:
> >
> > An app developer included a Kafka Producer in his app, and is not happy
> > with the throughput. He doesn't have visibility into the brokers since
> they
> > are owned by a different team. Obviously the first instinct of a
> developer
> > who knows that throttling exists is to blame throttling for any slowdown
> in
> > the app.
> > If he doesn't have a way to know from the responses whether or not his
> app
> > is throttled, he may end up calling Aditya at 4am asked "Hey, is my app
> > throttled?".
> >
> > I assume Aditya is trying to avoid this scenario.
> >
> > On Mon, Apr 6, 2015 at 7:47 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Hey Aditya,
> > >
> > > 2. I kind of buy it, but I really like to understand the details of the
> > use
> > > case before we make protocol changes. What changes are you proposing in
> > the
> > > clients for monitoring and how would that be used?
> > >
> > > -Jay
> > >
> > > On Mon, Apr 6, 2015 at 10:36 AM, Aditya Auradkar <
> > > aaurad...@linkedin.com.invalid> wrote:
> > >
> > > > Hi Jay,
> > > >
> > > > 2. At this time, the proposed response format changes are only for
> > > > monitoring/informing clients. As Jun mentioned, we get instance level
> > > > monitoring in this case since each instance that got throttled will
> > have
> > > a
> > > > metric confirming the same. Without client level monitoring for this,
> > > it's
> > > > hard for application developers to find if they are being throttled
> > since
> > > > they will also have to be aware of all the brokers in the cluster.
> This
> > > is
> > > > quite problematic for large clusters.
> > > >
> > > > It seems nice for app developers to not have to think about kafka
> > > internal
> > > > metrics and only focus on the metrics exposed on their instances.
> > > Analogous
> > > > to having client-sde request latency metrics. Basically, we want an
> > easy
> > > > way for clients to be aware if they are being throttled.
> > > >
> > > > 4. For purgatory v delay queue, I think we are on the same page. I
> feel
> > > it
> > > > is nicer to use the purgatory but I'm happy to use a DelayQueue if
> > there
> > > > are performance implications. I don't know enough about the current
> and
> > > > Yasuhiro's new implementation to be sure one way or the other.
> > > >
> > > > Stepping back, I think these two things are the only remaining point
> of
> > > > discussion within the current proposal. Any concerns if I started a
> > > voting
> > > > thread on the proposal after the KIP discussion tomorrow? (assuming
> we
> > > > reach consensus on these items)
> > > >
> > > > Thanks,
> > > > Aditya
> > > > ________________________________________
> > > > From: Jay Kreps [jay.kr...@gmail.com]
> > > > Sent: Saturday, April 04, 2015 1:36 PM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > >
> > > > Hey Aditya,
> > > >
> > > > 2. For the return flag I'm not terribly particular. If we want to add
> > it
> > > > let's fully think through how it will be used. The only concern I
> have
> > is
> > > > adding to the protocol without really thinking through the use cases.
> > So
> > > > let's work out the APIs we want to add to the Java consumer and
> > producer
> > > > and the use cases for how clients will make use of these. For my
> part I
> > > > actually don't see much use other than monitoring since it isn't an
> > error
> > > > condition to be at your quota. And if it is just monitoring I don't
> > see a
> > > > big enough difference between having the monitoring on the
> server-side
> > > > versus in the clients to justify putting it in the protocol. But I
> > think
> > > > you guys may have other use cases in mind of how a client would make
> > some
> > > > use of this? Let's work that out. I also don't feel strongly about
> > it--it
> > > > wouldn't be *bad* to have the monitoring available on the client,
> just
> > > > doesn't seem that much better.
> > > >
> > > > 4. For the purgatory vs delay queue I think is arguably nicer to
> reuse
> > > the
> > > > purgatory we just have to be ultra-conscious of efficiency. I think
> our
> > > > goal is to turn quotas on across the board, so at LinkedIn that would
> > > mean
> > > > potentially every request will need a small delay. I haven't worked
> out
> > > the
> > > > efficiency implications of this choice, so as long as we do that I'm
> > > happy.
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Apr 3, 2015 at 1:10 PM, Aditya Auradkar <
> > > > aaurad...@linkedin.com.invalid> wrote:
> > > >
> > > > > Some responses to Jay's points.
> > > > >
> > > > > 1. Using commas - Cool.
> > > > >
> > > > > 2. Adding return flag - I'm inclined to agree with Joel that this
> is
> > > good
> > > > > to have in the initial implementation.
> > > > >
> > > > > 3. Config - +1. I'll remove it from the KIP. We can discuss this in
> > > > > parallel.
> > > > >
> > > > > 4. Purgatory vs Delay queue - I feel that it is simpler to reuse
> the
> > > > > existing purgatories for both delayed produce and fetch requests.
> > IIUC,
> > > > all
> > > > > we need for quotas is a minWait parameter for DelayedOperation (or
> > > > > something equivalent) since there is already a max wait. The
> > completion
> > > > > criteria can check if minWait time has elapsed before declaring the
> > > > > operation complete. For this to impact performance, a significant
> > > number
> > > > of
> > > > > clients may need to exceed their quota at the same time and even
> then
> > > I'm
> > > > > not very clear on the scope of the impact. Two layers of delays
> might
> > > add
> > > > > complexity to the implementation which I'm hoping to avoid.
> > > > >
> > > > > Aditya
> > > > >
> > > > > ________________________________________
> > > > > From: Joel Koshy [jjkosh...@gmail.com]
> > > > > Sent: Friday, April 03, 2015 12:48 PM
> > > > > To: dev@kafka.apache.org
> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > >
> > > > > Aditya, thanks for the updated KIP and Jay/Jun thanks for the
> > > > > comments. Couple of comments in-line:
> > > > >
> > > > > > 2. I would advocate for adding the return flag when we next bump
> > the
> > > > > > request format version just to avoid proliferation. I agree this
> > is a
> > > > > good
> > > > > > thing to know about, but at the moment I don't think we have a
> very
> > > > well
> > > > > > flushed out idea of how the client would actually make use of
> this
> > > > info.
> > > > > I
> > > > >
> > > > > I'm somewhat inclined to having something appropriate off the bat -
> > > > > mainly because (i) clients really should know that they have been
> > > > > throttled (ii) a smart producer/consumer implementation would want
> to
> > > > > know how much to back off. So perhaps this and config-management
> > > > > should be moved to a separate discussion, but it would be good to
> > have
> > > > > this discussion going and incorporated into the first quota
> > > > > implementation.
> > > > >
> > > > > > 3. Config--I think we need to generalize the topic stuff so we
> can
> > > > > override
> > > > > > at multiple levels. We have topic and client, but I suspect
> "user"
> > > and
> > > > > > "broker" will also be important. I recommend we take config stuff
> > out
> > > > of
> > > > > > this KIP since we really need to fully think through a proposal
> > that
> > > > will
> > > > > > cover all these types of overrides.
> > > > >
> > > > > +1 - it is definitely orthogonal to the core quota implementation
> > > > > (although necessary for its operability). Having a config-related
> > > > > discussion in this KIP would only draw out the discussion and vote
> > > > > even if the core quota design looks good to everyone.
> > > > >
> > > > > So basically I think we can remove the portions on dynamic config
> as
> > > > > well as the response format but I really think we should close on
> > > > > those while the implementation is in progress and before quotas is
> > > > > officially released.
> > > > >
> > > > > > 4. Instead of using purgatories to implement the delay would it
> > make
> > > > more
> > > > > > sense to just use a delay queue? I think all the additional stuff
> > in
> > > > the
> > > > > > purgatory other than the delay queue doesn't make sense as the
> > quota
> > > > is a
> > > > > > hard N ms penalty with no chance of early eviction. If there is
> no
> > > perf
> > > > > > penalty for the full purgatory that may be fine (even good) to
> > reuse,
> > > > > but I
> > > > > > haven't looked into that.
> > > > >
> > > > > A simple delay queue sounds good - I think Aditya was also trying
> to
> > > > > avoid adding a new quota purgatory. i.e., it may be possible to use
> > > > > the existing purgatory instances to enforce quotas. That may be
> > > > > simpler, but would be incur a slight perf penalty if too many
> clients
> > > > > are being throttled.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Joel
> > > > >
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Fri, Apr 3, 2015 at 10:45 AM, Aditya Auradkar <
> > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > >
> > > > > >> Update, I added a proposal on doing dynamic client based
> > > configuration
> > > > > >> that can be used for quotas.
> > > > > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > >>
> > > > > >> Please take a look and let me know if there are any concerns.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Aditya
> > > > > >> ________________________________________
> > > > > >> From: Aditya Auradkar
> > > > > >> Sent: Friday, April 03, 2015 10:10 AM
> > > > > >> To: dev@kafka.apache.org
> > > > > >> Subject: RE: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >>
> > > > > >> Thanks Jun.
> > > > > >>
> > > > > >> Some thoughts:
> > > > > >>
> > > > > >> 10) I think it is better we throttle regardless of the
> > produce/fetch
> > > > > >> version. This is a nice feature where clients can tell if they
> are
> > > > being
> > > > > >> throttled or not. If we only throttle newer clients, then we
> have
> > > > > >> inconsistent behavior across clients in a multi-tenant cluster.
> > > Having
> > > > > >> quota metrics on the client side is also a nice incentive to
> > upgrade
> > > > > client
> > > > > >> versions.
> > > > > >>
> > > > > >> 11) I think we can call metric.record(fetchSize) before adding
> the
> > > > > >> delayedFetch request into the purgatory. This will give us the
> > > > estimated
> > > > > >> delay of the request up-front. The timeout on the DelayedFetch
> is
> > > the
> > > > > >> Max(maxWait, quotaDelay). The DelayedFetch completion criteria
> can
> > > > > change a
> > > > > >> little to accomodate quotas.
> > > > > >>
> > > > > >> - I agree the quota code should return the estimated delay time
> in
> > > > > >> QuotaViolationException.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Aditya
> > > > > >>
> > > > > >> ________________________________________
> > > > > >> From: Jun Rao [j...@confluent.io]
> > > > > >> Sent: Friday, April 03, 2015 9:16 AM
> > > > > >> To: dev@kafka.apache.org
> > > > > >> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >>
> > > > > >> Thanks for the update.
> > > > > >>
> > > > > >> 10. About whether to return a new field in the response to
> > indicate
> > > > > >> throttling. Earlier, the plan was to not change the response
> > format
> > > > and
> > > > > >> just have a metric on the broker to indicate whether a clientId
> is
> > > > > >> throttled or not. The issue is that we don't know whether a
> > > particular
> > > > > >> clientId instance is throttled or not (since there could be
> > multiple
> > > > > >> clients with the same clientId). Your proposal of adding an
> > > > isThrottled
> > > > > >> field in the response addresses and seems better. Then, do we
> just
> > > > > throttle
> > > > > >> the new version of produce/fetch request or both the old and the
> > new
> > > > > >> versions? Also, we probably still need a separate metric on the
> > > broker
> > > > > side
> > > > > >> to indicate whether a clientId is throttled or not.
> > > > > >>
> > > > > >> 11. Just to clarify. For fetch requests, when will
> > > > > metric.record(fetchSize)
> > > > > >> be called? Is it when we are ready to send the fetch response
> > (after
> > > > > >> minBytes and maxWait are satisfied)?
> > > > > >>
> > > > > >> As an implementation detail, it may be useful for the quota code
> > to
> > > > > return
> > > > > >> an estimated delay time (to bring the measurement within the
> > limit)
> > > in
> > > > > >> QuotaViolationException.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Wed, Apr 1, 2015 at 3:27 PM, Aditya Auradkar <
> > > > > >> aaurad...@linkedin.com.invalid> wrote:
> > > > > >>
> > > > > >> > Hey everyone,
> > > > > >> >
> > > > > >> > I've made changes to the KIP to capture our discussions over
> the
> > > > last
> > > > > >> > couple of weeks.
> > > > > >> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > >> >
> > > > > >> > I'll start a voting thread after people have had a chance to
> > > > > >> read/comment.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Aditya
> > > > > >> >
> > > > > >> > ________________________________________
> > > > > >> > From: Steven Wu [stevenz...@gmail.com]
> > > > > >> > Sent: Friday, March 20, 2015 9:14 AM
> > > > > >> > To: dev@kafka.apache.org
> > > > > >> > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >> >
> > > > > >> > +1 on Jun's suggestion of maintaining one set/style of metrics
> > at
> > > > > broker.
> > > > > >> > In Netflix, we have to convert the yammer metrics to servo
> > metrics
> > > > at
> > > > > >> > broker. it will be painful to know some metrics are in a
> > different
> > > > > style
> > > > > >> > and get to be handled differently.
> > > > > >> >
> > > > > >> > On Fri, Mar 20, 2015 at 8:17 AM, Jun Rao <j...@confluent.io>
> > > wrote:
> > > > > >> >
> > > > > >> > > Not so sure. People who use quota will definitely want to
> > > monitor
> > > > > the
> > > > > >> new
> > > > > >> > > metrics at the client id level. Then they will need to deal
> > with
> > > > > those
> > > > > >> > > metrics differently from the rest of the metrics. It would
> be
> > > > > better if
> > > > > >> > we
> > > > > >> > > can hide this complexity from the users.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jun
> > > > > >> > >
> > > > > >> > > On Thu, Mar 19, 2015 at 10:45 PM, Joel Koshy <
> > > jjkosh...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Actually thinking again - since these will be a few new
> > > metrics
> > > > at
> > > > > >> the
> > > > > >> > > > client id level (bytes in and bytes out to start with)
> maybe
> > > it
> > > > is
> > > > > >> fine
> > > > > >> > > to
> > > > > >> > > > have the two type of metrics coexist and we can migrate
> the
> > > > > existing
> > > > > >> > > > metrics in parallel.
> > > > > >> > > >
> > > > > >> > > > On Thursday, March 19, 2015, Joel Koshy <
> > jjkosh...@gmail.com>
> > > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > That is a valid concern but in that case I think it
> would
> > be
> > > > > better
> > > > > >> > to
> > > > > >> > > > > just migrate completely to the new metrics package
> first.
> > > > > >> > > > >
> > > > > >> > > > > On Thursday, March 19, 2015, Jun Rao <j...@confluent.io
> > > > > >> > > > > <javascript:_e(%7B%7D,'cvml','j...@confluent.io');>>
> > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Hmm, I was thinking a bit differently on the metrics
> > > stuff. I
> > > > > >> think
> > > > > >> > it
> > > > > >> > > > >> would be confusing to have some metrics defined in the
> > new
> > > > > metrics
> > > > > >> > > > package
> > > > > >> > > > >> while some others defined in Coda Hale. Those metrics
> > will
> > > > look
> > > > > >> > > > different
> > > > > >> > > > >> (e.g., rates in Coda Hale will have special attributes
> > such
> > > > as
> > > > > >> > > > >> 1-min-average). People may need different ways to
> export
> > > the
> > > > > >> metrics
> > > > > >> > > to
> > > > > >> > > > >> external systems such as Graphite. So, instead of using
> > the
> > > > new
> > > > > >> > > metrics
> > > > > >> > > > >> package on the broker, I was thinking that we can just
> > > > > implement a
> > > > > >> > > > >> QuotaMetrics that wraps the Coda Hale metrics. The
> > > > > implementation
> > > > > >> > can
> > > > > >> > > be
> > > > > >> > > > >> the same as what's in the new metrics package.
> > > > > >> > > > >>
> > > > > >> > > > >> Thanks,
> > > > > >> > > > >>
> > > > > >> > > > >> Jun
> > > > > >> > > > >>
> > > > > >> > > > >> On Thu, Mar 19, 2015 at 8:09 PM, Jay Kreps <
> > > > > jay.kr...@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > > >>
> > > > > >> > > > >> > Yeah I was saying was that we are blocked on picking
> an
> > > > > approach
> > > > > >> > for
> > > > > >> > > > >> > metrics but not necessarily the full conversion.
> > Clearly
> > > if
> > > > > we
> > > > > >> > pick
> > > > > >> > > > the
> > > > > >> > > > >> new
> > > > > >> > > > >> > metrics package we would need to implement the two
> > > metrics
> > > > we
> > > > > >> want
> > > > > >> > > to
> > > > > >> > > > >> quota
> > > > > >> > > > >> > on. But the conversion of the remaining metrics can
> be
> > > done
> > > > > >> > > > >> asynchronously.
> > > > > >> > > > >> >
> > > > > >> > > > >> > -Jay
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Thu, Mar 19, 2015 at 5:56 PM, Joel Koshy <
> > > > > >> jjkosh...@gmail.com>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> >
> > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't
> need
> > to
> > > > > block
> > > > > >> on
> > > > > >> > > the
> > > > > >> > > > >> > > > migration of the metrics package.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > Can you clarify the above? i.e., if we are going to
> > > quota
> > > > > on
> > > > > >> > > > something
> > > > > >> > > > >> > > then we would want to have migrated that metric
> over
> > > > > right? Or
> > > > > >> > do
> > > > > >> > > > you
> > > > > >> > > > >> > > mean we don't need to complete the migration of all
> > > > > metrics to
> > > > > >> > the
> > > > > >> > > > >> > > metrics package right?
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > I think most of us now feel that the delay + no
> error
> > > is
> > > > a
> > > > > >> good
> > > > > >> > > > >> > > approach, but it would be good to make sure
> everyone
> > is
> > > > on
> > > > > the
> > > > > >> > > same
> > > > > >> > > > >> > > page.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > As Aditya requested a couple of days ago I think we
> > > > should
> > > > > go
> > > > > >> > over
> > > > > >> > > > >> > > this at the next KIP hangout.
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > Joel
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > On Thu, Mar 19, 2015 at 09:24:09AM -0700, Jun Rao
> > > wrote:
> > > > > >> > > > >> > > > 1. Delay + no error seems reasonable to me.
> > However,
> > > I
> > > > do
> > > > > >> feel
> > > > > >> > > > that
> > > > > >> > > > >> we
> > > > > >> > > > >> > > need
> > > > > >> > > > >> > > > to give the client an indicator that it's being
> > > > > throttled,
> > > > > >> > > instead
> > > > > >> > > > >> of
> > > > > >> > > > >> > > doing
> > > > > >> > > > >> > > > this silently. For that, we probably need to
> evolve
> > > the
> > > > > >> > > > >> produce/fetch
> > > > > >> > > > >> > > > protocol to include an extra status field in the
> > > > > response.
> > > > > >> We
> > > > > >> > > > >> probably
> > > > > >> > > > >> > > need
> > > > > >> > > > >> > > > to think more about whether we just want to
> return
> > a
> > > > > simple
> > > > > >> > > status
> > > > > >> > > > >> code
> > > > > >> > > > >> > > > (e.g., 1 = throttled) or a value that indicates
> how
> > > > much
> > > > > is
> > > > > >> > > being
> > > > > >> > > > >> > > throttled.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > 2. We probably need to improve the histogram
> > support
> > > in
> > > > > the
> > > > > >> > new
> > > > > >> > > > >> metrics
> > > > > >> > > > >> > > > package before we can use it more widely on the
> > > server
> > > > > side
> > > > > >> > > (left
> > > > > >> > > > a
> > > > > >> > > > >> > > comment
> > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't
> need
> > to
> > > > > block
> > > > > >> on
> > > > > >> > > the
> > > > > >> > > > >> > > > migration of the metrics package.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Thanks,
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Jun
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > On Wed, Mar 18, 2015 at 4:02 PM, Aditya Auradkar
> <
> > > > > >> > > > >> > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > > Hey everyone,
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Thanks for the great discussion. There are
> > > currently
> > > > a
> > > > > few
> > > > > >> > > > points
> > > > > >> > > > >> on
> > > > > >> > > > >> > > this
> > > > > >> > > > >> > > > > KIP that need addressing and I want to make
> sure
> > we
> > > > > are on
> > > > > >> > the
> > > > > >> > > > >> same
> > > > > >> > > > >> > > page
> > > > > >> > > > >> > > > > about those.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > 1. Append and delay response vs delay and
> return
> > > > error
> > > > > >> > > > >> > > > > - I think we've discussed the pros and cons of
> > each
> > > > > >> approach
> > > > > >> > > but
> > > > > >> > > > >> > > haven't
> > > > > >> > > > >> > > > > chosen an approach yet. Where does everyone
> stand
> > > on
> > > > > this
> > > > > >> > > issue?
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > 2. Metrics Migration and usage in quotas
> > > > > >> > > > >> > > > > - The metrics library in clients has a notion
> of
> > > > quotas
> > > > > >> that
> > > > > >> > > we
> > > > > >> > > > >> > should
> > > > > >> > > > >> > > > > reuse. For that to happen, we need to migrate
> the
> > > > > server
> > > > > >> to
> > > > > >> > > the
> > > > > >> > > > >> new
> > > > > >> > > > >> > > metrics
> > > > > >> > > > >> > > > > package.
> > > > > >> > > > >> > > > > - Need more clarification on how to compute
> > > > throttling
> > > > > >> time
> > > > > >> > > and
> > > > > >> > > > >> > > windowing
> > > > > >> > > > >> > > > > for quotas.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > I'm going to start a new KIP to discuss metrics
> > > > > migration
> > > > > >> > > > >> separately.
> > > > > >> > > > >> > > That
> > > > > >> > > > >> > > > > will also contain a section on quotas.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > 3. Dynamic Configuration management - Being
> > > discussed
> > > > > in
> > > > > >> > > KIP-5.
> > > > > >> > > > >> > > Basically
> > > > > >> > > > >> > > > > we need something that will model default
> quotas
> > > and
> > > > > allow
> > > > > >> > > > >> per-client
> > > > > >> > > > >> > > > > overrides.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Is there something else that I'm missing?
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Thanks,
> > > > > >> > > > >> > > > > Aditya
> > > > > >> > > > >> > > > > ________________________________________
> > > > > >> > > > >> > > > > From: Jay Kreps [jay.kr...@gmail.com]
> > > > > >> > > > >> > > > > Sent: Wednesday, March 18, 2015 2:10 PM
> > > > > >> > > > >> > > > > To: dev@kafka.apache.org
> > > > > >> > > > >> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Hey Steven,
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > The current proposal is actually to enforce
> > quotas
> > > at
> > > > > the
> > > > > >> > > > >> > > > > client/application level, NOT the topic level.
> So
> > > if
> > > > > you
> > > > > >> > have
> > > > > >> > > a
> > > > > >> > > > >> > service
> > > > > >> > > > >> > > > > with a few dozen instances the quota is against
> > all
> > > > of
> > > > > >> those
> > > > > >> > > > >> > instances
> > > > > >> > > > >> > > > > added up across all their topics. So actually
> the
> > > > > effect
> > > > > >> > would
> > > > > >> > > > be
> > > > > >> > > > >> the
> > > > > >> > > > >> > > same
> > > > > >> > > > >> > > > > either way but throttling gives the producer
> the
> > > > > choice of
> > > > > >> > > > either
> > > > > >> > > > >> > > blocking
> > > > > >> > > > >> > > > > or dropping.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > -Jay
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > On Tue, Mar 17, 2015 at 10:08 AM, Steven Wu <
> > > > > >> > > > stevenz...@gmail.com
> > > > > >> > > > >> >
> > > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > > Jay,
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > let's say an app produces to 10 different
> > topics.
> > > > > one of
> > > > > >> > the
> > > > > >> > > > >> topic
> > > > > >> > > > >> > is
> > > > > >> > > > >> > > > > sent
> > > > > >> > > > >> > > > > > from a library. due to whatever
> condition/bug,
> > > this
> > > > > lib
> > > > > >> > > starts
> > > > > >> > > > >> to
> > > > > >> > > > >> > > send
> > > > > >> > > > >> > > > > > messages over the quota. if we go with the
> > > delayed
> > > > > >> > response
> > > > > >> > > > >> > > approach, it
> > > > > >> > > > >> > > > > > will cause the whole shared RecordAccumulator
> > > > buffer
> > > > > to
> > > > > >> be
> > > > > >> > > > >> filled
> > > > > >> > > > >> > up.
> > > > > >> > > > >> > > > > that
> > > > > >> > > > >> > > > > > will penalize other 9 topics who are within
> the
> > > > > quota.
> > > > > >> > that
> > > > > >> > > is
> > > > > >> > > > >> the
> > > > > >> > > > >> > > > > > unfairness point that Ewen and I were trying
> to
> > > > make.
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > if broker just drop the msg and return an
> > > > > error/status
> > > > > >> > code
> > > > > >> > > > >> > > indicates the
> > > > > >> > > > >> > > > > > drop and why. then producer can just move on
> > and
> > > > > accept
> > > > > >> > the
> > > > > >> > > > >> drop.
> > > > > >> > > > >> > > shared
> > > > > >> > > > >> > > > > > buffer won't be saturated and other 9 topics
> > > won't
> > > > be
> > > > > >> > > > penalized.
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > Thanks,
> > > > > >> > > > >> > > > > > Steven
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > On Tue, Mar 17, 2015 at 9:44 AM, Jay Kreps <
> > > > > >> > > > jay.kr...@gmail.com
> > > > > >> > > > >> >
> > > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > > Hey Steven,
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > It is true that hitting the quota will
> cause
> > > > > >> > back-pressure
> > > > > >> > > > on
> > > > > >> > > > >> the
> > > > > >> > > > >> > > > > > producer.
> > > > > >> > > > >> > > > > > > But the solution is simple, a producer that
> > > wants
> > > > > to
> > > > > >> > avoid
> > > > > >> > > > >> this
> > > > > >> > > > >> > > should
> > > > > >> > > > >> > > > > > stay
> > > > > >> > > > >> > > > > > > under its quota. In other words this is a
> > > > contract
> > > > > >> > between
> > > > > >> > > > the
> > > > > >> > > > >> > > cluster
> > > > > >> > > > >> > > > > > and
> > > > > >> > > > >> > > > > > > the client, with each side having something
> > to
> > > > > uphold.
> > > > > >> > > Quite
> > > > > >> > > > >> > > possibly
> > > > > >> > > > >> > > > > the
> > > > > >> > > > >> > > > > > > same thing will happen in the absence of a
> > > > quota, a
> > > > > >> > client
> > > > > >> > > > >> that
> > > > > >> > > > >> > > > > produces
> > > > > >> > > > >> > > > > > an
> > > > > >> > > > >> > > > > > > unexpected amount of load will hit the
> limits
> > > of
> > > > > the
> > > > > >> > > server
> > > > > >> > > > >> and
> > > > > >> > > > >> > > > > > experience
> > > > > >> > > > >> > > > > > > backpressure. Quotas just allow you to set
> > that
> > > > > same
> > > > > >> > limit
> > > > > >> > > > at
> > > > > >> > > > >> > > something
> > > > > >> > > > >> > > > > > > lower than 100% of all resources on the
> > server,
> > > > > which
> > > > > >> is
> > > > > >> > > > >> useful
> > > > > >> > > > >> > > for a
> > > > > >> > > > >> > > > > > > shared cluster.
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > -Jay
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > On Mon, Mar 16, 2015 at 11:34 PM, Steven
> Wu <
> > > > > >> > > > >> > stevenz...@gmail.com>
> > > > > >> > > > >> > > > > > wrote:
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > > wait. we create one kafka producer for
> each
> > > > > cluster.
> > > > > >> > > each
> > > > > >> > > > >> > > cluster can
> > > > > >> > > > >> > > > > > > have
> > > > > >> > > > >> > > > > > > > many topics. if producer buffer got
> filled
> > up
> > > > > due to
> > > > > >> > > > delayed
> > > > > >> > > > >> > > response
> > > > > >> > > > >> > > > > > for
> > > > > >> > > > >> > > > > > > > one throttled topic, won't that penalize
> > > other
> > > > > >> topics
> > > > > >> > > > >> unfairly?
> > > > > >> > > > >> > > it
> > > > > >> > > > >> > > > > > seems
> > > > > >> > > > >> > > > > > > to
> > > > > >> > > > >> > > > > > > > me that broker should just return error
> > > without
> > > > > >> delay.
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > > sorry that I am chatting to myself :)
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > > On Mon, Mar 16, 2015 at 11:29 PM, Steven
> > Wu <
> > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > >> > > > >> > > > > > > wrote:
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > > > I think I can answer my own question.
> > > delayed
> > > > > >> > response
> > > > > >> > > > >> will
> > > > > >> > > > >> > > cause
> > > > > >> > > > >> > > > > the
> > > > > >> > > > >> > > > > > > > > producer buffer to be full, which then
> > > result
> > > > > in
> > > > > >> > > either
> > > > > >> > > > >> > thread
> > > > > >> > > > >> > > > > > blocking
> > > > > >> > > > >> > > > > > > > or
> > > > > >> > > > >> > > > > > > > > message drop.
> > > > > >> > > > >> > > > > > > > >
> > > > > >> > > > >> > > > > > > > > On Mon, Mar 16, 2015 at 11:24 PM,
> Steven
> > > Wu <
> > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > >> > > > >> > > > > > > > wrote:
> > > > > >> > > > >> > > > > > > > >
> > > > > >> > > > >> > > > > > > > >> please correct me if I am missing sth
> > > here.
> > > > I
> > > > > am
> > > > > >> > not
> > > > > >> > > > >> > > understanding
> > > > > >> > > > >> > > > > > how
> > > > > >> > > > >> > > > > > > > >> would throttle work without
> > > > > cooperation/back-off
> > > > > >> > from
> > > > > >> > > > >> > > producer.
> > > > > >> > > > >> > > > > new
> > > > > >> > > > >> > > > > > > Java
> > > > > >> > > > >> > > > > > > > >> producer supports non-blocking API.
> why
> > > > would
> > > > > >> > delayed
> > > > > >> > > > >> > > response be
> > > > > >> > > > >> > > > > > able
> > > > > >> > > > >> > > > > > > > to
> > > > > >> > > > >> > > > > > > > >> slow down producer? producer will
> > continue
> > > > to
> > > > > >> fire
> > > > > >> > > > async
> > > > > >> > > > >> > > sends.
> > > > > >> > > > >> > > > > > > > >>
> > > > > >> > > > >> > > > > > > > >> On Mon, Mar 16, 2015 at 10:58 PM,
> > Guozhang
> > > > > Wang <
> > > > > >> > > > >> > > > > wangg...@gmail.com
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > > >> wrote:
> > > > > >> > > > >> > > > > > > > >>
> > > > > >> > > > >> > > > > > > > >>> I think we are really discussing two
> > > > separate
> > > > > >> > issues
> > > > > >> > > > >> here:
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> 1. Whether we should a)
> > > > > >> > > > >> > > > > append-then-block-then-returnOKButThrottled
> > > > > >> > > > >> > > > > > > or
> > > > > >> > > > >> > > > > > > > b)
> > > > > >> > > > >> > > > > > > > >>> block-then-returnFailDuetoThrottled
> for
> > > > quota
> > > > > >> > > actions
> > > > > >> > > > on
> > > > > >> > > > >> > > produce
> > > > > >> > > > >> > > > > > > > >>> requests.
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> Both these approaches assume some
> kind
> > of
> > > > > >> > > > >> well-behaveness
> > > > > >> > > > >> > of
> > > > > >> > > > >> > > the
> > > > > >> > > > >> > > > > > > > clients:
> > > > > >> > > > >> > > > > > > > >>> option a) assumes the client sets an
> > > proper
> > > > > >> > timeout
> > > > > >> > > > >> value
> > > > > >> > > > >> > > while
> > > > > >> > > > >> > > > > can
> > > > > >> > > > >> > > > > > > > just
> > > > > >> > > > >> > > > > > > > >>> ignore "OKButThrottled" response,
> while
> > > > > option
> > > > > >> b)
> > > > > >> > > > >> assumes
> > > > > >> > > > >> > the
> > > > > >> > > > >> > > > > > client
> > > > > >> > > > >> > > > > > > > >>> handles the "FailDuetoThrottled"
> > > > > appropriately.
> > > > > >> > For
> > > > > >> > > > any
> > > > > >> > > > >> > > malicious
> > > > > >> > > > >> > > > > > > > clients
> > > > > >> > > > >> > > > > > > > >>> that, for example, just keep retrying
> > > > either
> > > > > >> > > > >> intentionally
> > > > > >> > > > >> > or
> > > > > >> > > > >> > > > > not,
> > > > > >> > > > >> > > > > > > > >>> neither
> > > > > >> > > > >> > > > > > > > >>> of these approaches are actually
> > > effective.
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> 2. For "OKButThrottled" and
> > > > > "FailDuetoThrottled"
> > > > > >> > > > >> responses,
> > > > > >> > > > >> > > shall
> > > > > >> > > > >> > > > > > we
> > > > > >> > > > >> > > > > > > > >>> encode
> > > > > >> > > > >> > > > > > > > >>> them as error codes or augment the
> > > protocol
> > > > > to
> > > > > >> > use a
> > > > > >> > > > >> > separate
> > > > > >> > > > >> > > > > field
> > > > > >> > > > >> > > > > > > > >>> indicating "status codes".
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> Today we have already incorporated
> some
> > > > > status
> > > > > >> > code
> > > > > >> > > as
> > > > > >> > > > >> > error
> > > > > >> > > > >> > > > > codes
> > > > > >> > > > >> > > > > > in
> > > > > >> > > > >> > > > > > > > the
> > > > > >> > > > >> > > > > > > > >>> responses, e.g. ReplicaNotAvailable
> in
> > > > > >> > > > MetadataResponse,
> > > > > >> > > > >> > the
> > > > > >> > > > >> > > pros
> > > > > >> > > > >> > > > > > of
> > > > > >> > > > >> > > > > > > > this
> > > > > >> > > > >> > > > > > > > >>> is of course using a single field for
> > > > > response
> > > > > >> > > status
> > > > > >> > > > >> like
> > > > > >> > > > >> > > the
> > > > > >> > > > >> > > > > HTTP
> > > > > >> > > > >> > > > > > > > >>> status
> > > > > >> > > > >> > > > > > > > >>> codes, while the cons is that it
> > requires
> > > > > >> clients
> > > > > >> > to
> > > > > >> > > > >> handle
> > > > > >> > > > >> > > the
> > > > > >> > > > >> > > > > > error
> > > > > >> > > > >> > > > > > > > >>> codes
> > > > > >> > > > >> > > > > > > > >>> carefully.
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> I think maybe we can actually extend
> > the
> > > > > >> > single-code
> > > > > >> > > > >> > > approach to
> > > > > >> > > > >> > > > > > > > overcome
> > > > > >> > > > >> > > > > > > > >>> its drawbacks, that is, wrap the
> error
> > > > codes
> > > > > >> > > semantics
> > > > > >> > > > >> to
> > > > > >> > > > >> > the
> > > > > >> > > > >> > > > > users
> > > > > >> > > > >> > > > > > > so
> > > > > >> > > > >> > > > > > > > >>> that
> > > > > >> > > > >> > > > > > > > >>> users do not need to handle the codes
> > > > > >> one-by-one.
> > > > > >> > > More
> > > > > >> > > > >> > > > > concretely,
> > > > > >> > > > >> > > > > > > > >>> following Jay's example the client
> > could
> > > > > write
> > > > > >> > sth.
> > > > > >> > > > like
> > > > > >> > > > >> > > this:
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>>   if(error.isOK())
> > > > > >> > > > >> > > > > > > > >>>      // status code is good or the
> code
> > > can
> > > > > be
> > > > > >> > > simply
> > > > > >> > > > >> > > ignored for
> > > > > >> > > > >> > > > > > > this
> > > > > >> > > > >> > > > > > > > >>> request type, process the request
> > > > > >> > > > >> > > > > > > > >>>   else if(error.needsRetry())
> > > > > >> > > > >> > > > > > > > >>>      // throttled, transient error,
> > etc:
> > > > > retry
> > > > > >> > > > >> > > > > > > > >>>   else if(error.isFatal())
> > > > > >> > > > >> > > > > > > > >>>      // non-retriable errors, etc:
> > > notify /
> > > > > >> > > terminate
> > > > > >> > > > /
> > > > > >> > > > >> > other
> > > > > >> > > > >> > > > > > > handling
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> Only when the clients really want to
> > > > handle,
> > > > > for
> > > > > >> > > > example
> > > > > >> > > > >> > > > > > > > >>> FailDuetoThrottled
> > > > > >> > > > >> > > > > > > > >>> status code specifically, it needs
> to:
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>>   if(error.isOK())
> > > > > >> > > > >> > > > > > > > >>>      // status code is good or the
> code
> > > can
> > > > > be
> > > > > >> > > simply
> > > > > >> > > > >> > > ignored for
> > > > > >> > > > >> > > > > > > this
> > > > > >> > > > >> > > > > > > > >>> request type, process the request
> > > > > >> > > > >> > > > > > > > >>>   else if(error ==
> FailDuetoThrottled )
> > > > > >> > > > >> > > > > > > > >>>      // throttled: log it
> > > > > >> > > > >> > > > > > > > >>>   else if(error.needsRetry())
> > > > > >> > > > >> > > > > > > > >>>      // transient error, etc: retry
> > > > > >> > > > >> > > > > > > > >>>   else if(error.isFatal())
> > > > > >> > > > >> > > > > > > > >>>      // non-retriable errors, etc:
> > > notify /
> > > > > >> > > terminate
> > > > > >> > > > /
> > > > > >> > > > >> > other
> > > > > >> > > > >> > > > > > > handling
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> And for implementation we can
> probably
> > > > group
> > > > > the
> > > > > >> > > codes
> > > > > >> > > > >> > > > > accordingly
> > > > > >> > > > >> > > > > > > like
> > > > > >> > > > >> > > > > > > > >>> HTTP status code such that we can do:
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> boolean Error.isOK() {
> > > > > >> > > > >> > > > > > > > >>>   return code < 300 && code >= 200;
> > > > > >> > > > >> > > > > > > > >>> }
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> Guozhang
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> On Mon, Mar 16, 2015 at 10:24 PM,
> Ewen
> > > > > >> > > > Cheslack-Postava
> > > > > >> > > > >> <
> > > > > >> > > > >> > > > > > > > >>> e...@confluent.io>
> > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> > Agreed that trying to shoehorn
> > > non-error
> > > > > codes
> > > > > >> > > into
> > > > > >> > > > >> the
> > > > > >> > > > >> > > error
> > > > > >> > > > >> > > > > > field
> > > > > >> > > > >> > > > > > > > is
> > > > > >> > > > >> > > > > > > > >>> a
> > > > > >> > > > >> > > > > > > > >>> > bad idea. It makes it *way* too
> easy
> > to
> > > > > write
> > > > > >> > code
> > > > > >> > > > >> that
> > > > > >> > > > >> > > looks
> > > > > >> > > > >> > > > > > (and
> > > > > >> > > > >> > > > > > > > >>> should
> > > > > >> > > > >> > > > > > > > >>> > be) correct but is actually
> > incorrect.
> > > If
> > > > > >> > > > necessary, I
> > > > > >> > > > >> > > think
> > > > > >> > > > >> > > > > it's
> > > > > >> > > > >> > > > > > > > much
> > > > > >> > > > >> > > > > > > > >>> > better to to spend a couple of
> extra
> > > > bytes
> > > > > to
> > > > > >> > > encode
> > > > > >> > > > >> that
> > > > > >> > > > >> > > > > > > information
> > > > > >> > > > >> > > > > > > > >>> > separately (a "status" or "warning"
> > > > > section of
> > > > > >> > the
> > > > > >> > > > >> > > response).
> > > > > >> > > > >> > > > > An
> > > > > >> > > > >> > > > > > > > >>> indication
> > > > > >> > > > >> > > > > > > > >>> > that throttling is occurring is
> > > something
> > > > > I'd
> > > > > >> > > expect
> > > > > >> > > > >> to
> > > > > >> > > > >> > be
> > > > > >> > > > >> > > > > > > indicated
> > > > > >> > > > >> > > > > > > > >>> by a
> > > > > >> > > > >> > > > > > > > >>> > bit flag in the response rather
> than
> > as
> > > > an
> > > > > >> error
> > > > > >> > > > code.
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> > Gwen - I think an error code makes
> > > sense
> > > > > when
> > > > > >> > the
> > > > > >> > > > >> request
> > > > > >> > > > >> > > > > > actually
> > > > > >> > > > >> > > > > > > > >>> failed.
> > > > > >> > > > >> > > > > > > > >>> > Option B, which Jun was advocating,
> > > would
> > > > > have
> > > > > >> > > > >> appended
> > > > > >> > > > >> > the
> > > > > >> > > > >> > > > > > > messages
> > > > > >> > > > >> > > > > > > > >>> > successfully. If the rate-limiting
> > case
> > > > > you're
> > > > > >> > > > talking
> > > > > >> > > > >> > > about
> > > > > >> > > > >> > > > > had
> > > > > >> > > > >> > > > > > > > >>> > successfully committed the
> messages,
> > I
> > > > > would
> > > > > >> say
> > > > > >> > > > >> that's
> > > > > >> > > > >> > > also a
> > > > > >> > > > >> > > > > > bad
> > > > > >> > > > >> > > > > > > > use
> > > > > >> > > > >> > > > > > > > >>> of
> > > > > >> > > > >> > > > > > > > >>> > error codes.
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> > On Mon, Mar 16, 2015 at 10:16 PM,
> > Gwen
> > > > > >> Shapira <
> > > > > >> > > > >> > > > > > > > gshap...@cloudera.com>
> > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> > > We discussed an error code for
> > > > > rate-limiting
> > > > > >> > > > (which
> > > > > >> > > > >> I
> > > > > >> > > > >> > > think
> > > > > >> > > > >> > > > > > made
> > > > > >> > > > >> > > > > > > > >>> > > sense), isn't it a similar case?
> > > > > >> > > > >> > > > > > > > >>> > >
> > > > > >> > > > >> > > > > > > > >>> > > On Mon, Mar 16, 2015 at 10:10 PM,
> > Jay
> > > > > Kreps
> > > > > >> <
> > > > > >> > > > >> > > > > > jay.kr...@gmail.com
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > >> > > > >> > > > > > > > >>> > > > My concern is that as soon as
> you
> > > > start
> > > > > >> > > encoding
> > > > > >> > > > >> > > non-error
> > > > > >> > > > >> > > > > > > > response
> > > > > >> > > > >> > > > > > > > >>> > > > information into error codes
> the
> > > next
> > > > > >> > question
> > > > > >> > > > is
> > > > > >> > > > >> > what
> > > > > >> > > > >> > > to
> > > > > >> > > > >> > > > > do
> > > > > >> > > > >> > > > > > if
> > > > > >> > > > >> > > > > > > > two
> > > > > >> > > > >> > > > > > > > >>> > such
> > > > > >> > > > >> > > > > > > > >>> > > > codes apply (i.e. you have a
> > > replica
> > > > > down
> > > > > >> > and
> > > > > >> > > > the
> > > > > >> > > > >> > > response
> > > > > >> > > > >> > > > > is
> > > > > >> > > > >> > > > > > > > >>> > quota'd). I
> > > > > >> > > > >> > > > > > > > >>> > > > think I am trying to argue that
> > > error
> > > > > >> should
> > > > > >> > > > mean
> > > > > >> > > > >> > "why
> > > > > >> > > > >> > > we
> > > > > >> > > > >> > > > > > > failed
> > > > > >> > > > >> > > > > > > > >>> your
> > > > > >> > > > >> > > > > > > > >>> > > > request", for which there will
> > > really
> > > > > only
> > > > > >> > be
> > > > > >> > > > one
> > > > > >> > > > >> > > reason,
> > > > > >> > > > >> > > > > and
> > > > > >> > > > >> > > > > > > any
> > > > > >> > > > >> > > > > > > > >>> other
> > > > > >> > > > >> > > > > > > > >>> > > > useful information we want to
> > send
> > > > > back is
> > > > > >> > > just
> > > > > >> > > > >> > another
> > > > > >> > > > >> > > > > field
> > > > > >> > > > >> > > > > > > in
> > > > > >> > > > >> > > > > > > > >>> the
> > > > > >> > > > >> > > > > > > > >>> > > > response.
> > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > >> > > > >> > > > > > > > >>> > > > -Jay
> > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > >> > > > >> > > > > > > > >>> > > > On Mon, Mar 16, 2015 at 9:51
> PM,
> > > Gwen
> > > > > >> > Shapira
> > > > > >> > > <
> > > > > >> > > > >> > > > > > > > >>> gshap...@cloudera.com>
> > > > > >> > > > >> > > > > > > > >>> > > wrote:
> > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > >> > > > >> > > > > > > > >>> > > >> I think its not too late to
> > > reserve
> > > > a
> > > > > set
> > > > > >> > of
> > > > > >> > > > >> error
> > > > > >> > > > >> > > codes
> > > > > >> > > > >> > > > > > > > >>> (200-299?)
> > > > > >> > > > >> > > > > > > > >>> > > >> for "non-error" codes.
> > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > >> > > > >> > > > > > > > >>> > > >> It won't be backward
> compatible
> > > > (i.e.
> > > > > >> > clients
> > > > > >> > > > >> that
> > > > > >> > > > >> > > > > currently
> > > > > >> > > > >> > > > > > > do
> > > > > >> > > > >> > > > > > > > >>> "else
> > > > > >> > > > >> > > > > > > > >>> > > >> throw" will throw on
> > non-errors),
> > > > but
> > > > > >> > perhaps
> > > > > >> > > > its
> > > > > >> > > > >> > > > > > worthwhile.
> > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > >> > > > >> > > > > > > > >>> > > >> On Mon, Mar 16, 2015 at 9:42
> PM,
> > > Jay
> > > > > >> Kreps
> > > > > >> > <
> > > > > >> > > > >> > > > > > > jay.kr...@gmail.com
> > > > > >> > > > >> > > > > > > > >
> > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > >> > > > >> > > > > > > > >>> > > >> > Hey Jun,
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > I'd really really really
> like
> > to
> > > > > avoid
> > > > > >> > > that.
> > > > > >> > > > >> > Having
> > > > > >> > > > >> > > just
> > > > > >> > > > >> > > > > > > > spent a
> > > > > >> > > > >> > > > > > > > >>> > > bunch of
> > > > > >> > > > >> > > > > > > > >>> > > >> > time on the clients, using
> the
> > > > error
> > > > > >> > codes
> > > > > >> > > to
> > > > > >> > > > >> > encode
> > > > > >> > > > >> > > > > other
> > > > > >> > > > >> > > > > > > > >>> > information
> > > > > >> > > > >> > > > > > > > >>> > > >> > about the response is super
> > > > > dangerous.
> > > > > >> > The
> > > > > >> > > > >> error
> > > > > >> > > > >> > > > > handling
> > > > > >> > > > >> > > > > > is
> > > > > >> > > > >> > > > > > > > >>> one of
> > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > >> > > > >> > > > > > > > >>> > > >> > hardest parts of the client
> > > > > (Guozhang
> > > > > >> > chime
> > > > > >> > > > in
> > > > > >> > > > >> > > here).
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > Generally the error handling
> > > looks
> > > > > like
> > > > > >> > > > >> > > > > > > > >>> > > >> >   if(error == none)
> > > > > >> > > > >> > > > > > > > >>> > > >> >      // good, process the
> > > request
> > > > > >> > > > >> > > > > > > > >>> > > >> >   else if(error ==
> > > KNOWN_ERROR_1)
> > > > > >> > > > >> > > > > > > > >>> > > >> >      // handle known error 1
> > > > > >> > > > >> > > > > > > > >>> > > >> >   else if(error ==
> > > KNOWN_ERROR_2)
> > > > > >> > > > >> > > > > > > > >>> > > >> >      // handle known error 2
> > > > > >> > > > >> > > > > > > > >>> > > >> >   else
> > > > > >> > > > >> > > > > > > > >>> > > >> >      throw
> > > > > >> > > Errors.forCode(error).exception();
> > > > > >> > > > >> //
> > > > > >> > > > >> > or
> > > > > >> > > > >> > > some
> > > > > >> > > > >> > > > > > > other
> > > > > >> > > > >> > > > > > > > >>> > default
> > > > > >> > > > >> > > > > > > > >>> > > >> > behavior
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > This works because we have a
> > > > > convention
> > > > > >> > > that
> > > > > >> > > > >> and
> > > > > >> > > > >> > > error
> > > > > >> > > > >> > > > > is
> > > > > >> > > > >> > > > > > > > >>> something
> > > > > >> > > > >> > > > > > > > >>> > > that
> > > > > >> > > > >> > > > > > > > >>> > > >> > prevented your getting the
> > > > response
> > > > > so
> > > > > >> > the
> > > > > >> > > > >> default
> > > > > >> > > > >> > > > > > handling
> > > > > >> > > > >> > > > > > > > >>> case is
> > > > > >> > > > >> > > > > > > > >>> > > sane
> > > > > >> > > > >> > > > > > > > >>> > > >> > and forward compatible. It
> is
> > > > > tempting
> > > > > >> to
> > > > > >> > > use
> > > > > >> > > > >> the
> > > > > >> > > > >> > > error
> > > > > >> > > > >> > > > > > code
> > > > > >> > > > >> > > > > > > > to
> > > > > >> > > > >> > > > > > > > >>> > convey
> > > > > >> > > > >> > > > > > > > >>> > > >> > information in the success
> > case.
> > > > For
> > > > > >> > > example
> > > > > >> > > > we
> > > > > >> > > > >> > > could
> > > > > >> > > > >> > > > > use
> > > > > >> > > > >> > > > > > > > error
> > > > > >> > > > >> > > > > > > > >>> > codes
> > > > > >> > > > >> > > > > > > > >>> > > to
> > > > > >> > > > >> > > > > > > > >>> > > >> > encode whether quotas were
> > > > enforced,
> > > > > >> > > whether
> > > > > >> > > > >> the
> > > > > >> > > > >> > > request
> > > > > >> > > > >> > > > > > was
> > > > > >> > > > >> > > > > > > > >>> served
> > > > > >> > > > >> > > > > > > > >>> > > out
> > > > > >> > > > >> > > > > > > > >>> > > >> of
> > > > > >> > > > >> > > > > > > > >>> > > >> > cache, whether the stock
> > market
> > > is
> > > > > up
> > > > > >> > > today,
> > > > > >> > > > or
> > > > > >> > > > >> > > > > whatever.
> > > > > >> > > > >> > > > > > > The
> > > > > >> > > > >> > > > > > > > >>> > problem
> > > > > >> > > > >> > > > > > > > >>> > > is
> > > > > >> > > > >> > > > > > > > >>> > > >> > that since these are not
> > errors
> > > as
> > > > > far
> > > > > >> as
> > > > > >> > > the
> > > > > >> > > > >> > > client is
> > > > > >> > > > >> > > > > > > > >>> concerned it
> > > > > >> > > > >> > > > > > > > >>> > > >> should
> > > > > >> > > > >> > > > > > > > >>> > > >> > not throw an exception but
> > > process
> > > > > the
> > > > > >> > > > >> response,
> > > > > >> > > > >> > > but now
> > > > > >> > > > >> > > > > > we
> > > > > >> > > > >> > > > > > > > >>> created
> > > > > >> > > > >> > > > > > > > >>> > an
> > > > > >> > > > >> > > > > > > > >>> > > >> > explicit requirement that
> that
> > > > > error be
> > > > > >> > > > handled
> > > > > >> > > > >> > > > > explicitly
> > > > > >> > > > >> > > > > > > > >>> since it
> > > > > >> > > > >> > > > > > > > >>> > is
> > > > > >> > > > >> > > > > > > > >>> > > >> > different. I really think
> that
> > > > this
> > > > > >> kind
> > > > > >> > of
> > > > > >> > > > >> > > information
> > > > > >> > > > >> > > > > is
> > > > > >> > > > >> > > > > > > not
> > > > > >> > > > >> > > > > > > > >>> an
> > > > > >> > > > >> > > > > > > > >>> > > error,
> > > > > >> > > > >> > > > > > > > >>> > > >> it
> > > > > >> > > > >> > > > > > > > >>> > > >> > is just information, and if
> we
> > > > want
> > > > > it
> > > > > >> in
> > > > > >> > > the
> > > > > >> > > > >> > > response
> > > > > >> > > > >> > > > > we
> > > > > >> > > > >> > > > > > > > >>> should do
> > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > >> > > > >> > > > > > > > >>> > > >> > right thing and add a new
> > field
> > > to
> > > > > the
> > > > > >> > > > >> response.
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > I think you saw the Samza
> bug
> > > that
> > > > > was
> > > > > >> > > > >> literally
> > > > > >> > > > >> > an
> > > > > >> > > > >> > > > > > example
> > > > > >> > > > >> > > > > > > of
> > > > > >> > > > >> > > > > > > > >>> this
> > > > > >> > > > >> > > > > > > > >>> > > >> > happening and leading to an
> > > > infinite
> > > > > >> > retry
> > > > > >> > > > >> loop.
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > Further more I really want
> to
> > > > > emphasize
> > > > > >> > > that
> > > > > >> > > > >> > hitting
> > > > > >> > > > >> > > > > your
> > > > > >> > > > >> > > > > > > > quota
> > > > > >> > > > >> > > > > > > > >>> in
> > > > > >> > > > >> > > > > > > > >>> > the
> > > > > >> > > > >> > > > > > > > >>> > > >> > design that Adi has proposed
> > is
> > > > > >> actually
> > > > > >> > > not
> > > > > >> > > > an
> > > > > >> > > > >> > > error
> > > > > >> > > > >> > > > > > > > condition
> > > > > >> > > > >> > > > > > > > >>> at
> > > > > >> > > > >> > > > > > > > >>> > > all.
> > > > > >> > > > >> > > > > > > > >>> > > >> It
> > > > > >> > > > >> > > > > > > > >>> > > >> > is totally reasonable in any
> > > > > bootstrap
> > > > > >> > > > >> situation
> > > > > >> > > > >> > to
> > > > > >> > > > >> > > > > > > > >>> intentionally
> > > > > >> > > > >> > > > > > > > >>> > > want to
> > > > > >> > > > >> > > > > > > > >>> > > >> > run at the limit the system
> > > > imposes
> > > > > on
> > > > > >> > you.
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > -Jay
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> > On Mon, Mar 16, 2015 at 4:27
> > PM,
> > > > Jun
> > > > > >> Rao
> > > > > >> > <
> > > > > >> > > > >> > > > > > j...@confluent.io>
> > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >> It's probably useful for a
> > > client
> > > > > to
> > > > > >> > know
> > > > > >> > > > >> whether
> > > > > >> > > > >> > > its
> > > > > >> > > > >> > > > > > > > requests
> > > > > >> > > > >> > > > > > > > >>> are
> > > > > >> > > > >> > > > > > > > >>> > > >> >> throttled or not (e.g., for
> > > > > monitoring
> > > > > >> > and
> > > > > >> > > > >> > > alerting).
> > > > > >> > > > >> > > > > > From
> > > > > >> > > > >> > > > > > > > that
> > > > > >> > > > >> > > > > > > > >>> > > >> >> perspective, option B
> (delay
> > > the
> > > > > >> > requests
> > > > > >> > > > and
> > > > > >> > > > >> > > return an
> > > > > >> > > > >> > > > > > > > error)
> > > > > >> > > > >> > > > > > > > >>> > seems
> > > > > >> > > > >> > > > > > > > >>> > > >> >> better.
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > >> > > > >> > > > > > > > >>> > > >> >> Thanks,
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > >> > > > >> > > > > > > > >>> > > >> >> Jun
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > >> > > > >> > > > > > > > >>> > > >> >> On Wed, Mar 4, 2015 at 3:51
> > PM,
> > > > > Aditya
> > > > > >> > > > >> Auradkar <
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > aaurad...@linkedin.com.invalid
> > > >
> > > > > >> wrote:
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > >> > > > >> > > > > > > > >>> > > >> >> > Posted a KIP for quotas
> in
> > > > kafka.
> > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >>
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >> > Appreciate any feedback.
> > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >> > Aditya
> > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > >> > > > >> > > > > > > > >>> > >
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>> > --
> > > > > >> > > > >> > > > > > > > >>> > Thanks,
> > > > > >> > > > >> > > > > > > > >>> > Ewen
> > > > > >> > > > >> > > > > > > > >>> >
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>> --
> > > > > >> > > > >> > > > > > > > >>> -- Guozhang
> > > > > >> > > > >> > > > > > > > >>>
> > > > > >> > > > >> > > > > > > > >>
> > > > > >> > > > >> > > > > > > > >>
> > > > > >> > > > >> > > > > > > > >
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > Sent from Gmail Mobile
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > Sent from Gmail Mobile
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [KIP-DISCUSSION] KIP-13 Quotas

Reply via email to