Thanks for the detailed review. I've addressed your comments.

For rejected alternatives, we've rejected per-partition distribution because we 
choose client based quotas where there is no notion of partitions. I've 
explained in a bit more detail in that section.

Aditya

________________________________________
From: Joel Koshy [jjkosh...@gmail.com]
Sent: Wednesday, April 08, 2015 6:30 AM
To: dev@kafka.apache.org
Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas

Thanks for updating the wiki. Looks great overall. Just a couple
more comments:

Client status code:
- v0 requests -> current version (0) of those requests.
- Fetch response has a throttled flag instead of throttle time -  I
  think you intended the latter.
- Can you make it clear that the quota status is a new field
  called throttleTimeMs (or equivalent). It would help if some of
  that is moved (or repeated) in compatibility/migration plan.
- So you would need to upgrade brokers first, then the clients.
  While upgrading the brokers (via a rolling bounce) the brokers
  cannot start using the latest fetch-request version immediately
  (for replica fetches). Since there will be older brokers in the mix
  those brokers would not be able to read v1 fetch requests. So all
  the brokers should be upgraded before switching to the latest
  fetch request version. This is similar to what Gwen proposed in
  KIP-2/KAFKA-1809 and I think we will need to use the
  inter-broker protocol version config.

Rejected alternatives-quota-distribution.B: notes that this is the
most elegant model, but does not explain why it was rejected. I
think this was because we would then need some sort of gossip
between brokers since partitions are across the cluster. Can you
confirm?

Thanks,

Joel

On Wed, Apr 08, 2015 at 05:45:34AM +0000, Aditya Auradkar wrote:
> Hey everyone,
>
> Following up after today's hangout. After discussing the client side metrics 
> piece internally, we've incorporated that section into the KIP.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
>
> Since there appears to be sufficient consensus, I'm going to start a voting 
> thread.
>
> Thanks,
> Aditya
> ________________________________________
> From: Gwen Shapira [gshap...@cloudera.com]
> Sent: Tuesday, April 07, 2015 11:31 AM
> To: Sriharsha Chintalapani
> Cc: dev@kafka.apache.org
> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
>
> Yeah, I was not suggesting adding auth to metrics - I think this needlessly
> complicates everything.
> But we need to assume that client developers will not have access to the
> broker metrics (because in secure environment they probably won't).
>
> Gwen
>
> On Tue, Apr 7, 2015 at 11:20 AM, Sriharsha Chintalapani <ka...@harsha.io>
> wrote:
>
> > Having auth  on top of metrics is going to be lot more difficult. How are
> > we going to restrict metrics reporter which run as part of kafka server
> > they will have access to all the metrics and they can publish to ganglia
> > etc..  I look at the metrics as a read-only info. As you said metrics for
> > all the topics can be visible but what actions are we looking that can be
> > non-secure based on metrics alone? . This probably can be part of KIP-11
> > discussion.
> >  Having said that it will be great if the throttling details can be
> > exposed as part of the response to the client. Instead of looking at
> > metrics , client can depend on the response to slow down if its being
> > throttled.  This allows us the clients can be self-reliant based on the
> > response .
> >
> > --
> > Harsha
> >
> >
> > On April 7, 2015 at 9:55:41 AM, Gwen Shapira (gshap...@cloudera.com)
> > wrote:
> >
> > Re (1):
> > We have no authorization story on the metrics collected by brokers, so I
> > assume that access to broker metrics means knowing exactly which topics
> > exist and their throughputs. (Prath and Don, correct me if I got it
> > wrong...)
> > Secure environments will strictly control access to this information, so I
> > am pretty sure the client developers will not have access to server
> > metrics
> > at all.
> >
> > Gwen
> >
> > On Tue, Apr 7, 2015 at 7:41 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Totally. But is that the only use? What I wanted to flesh out was
> > whether
> > > the goal was:
> > > 1. Expose throttling in the client metrics
> > > 2. Enable programmatic response (i.e. stop sending stuff or something
> > like
> > > that)
> > >
> > > I think I kind of understand (1) but let's get specific on the metric we
> > > would be adding and what exactly you would expose in a dashboard. For
> > > example if the goal is just monitoring do I really want a boolean flag
> > for
> > > is_throttled or do I want to know how much I am being throttled (i.e.
> > > throttle_pct might indicate the percent of your request time that was
> > due
> > > to throttling or something like that)? If I am 1% throttled that may be
> > > irrelevant but 99% throttled would be quite relevant? Not sure I agree,
> > > just throwing that out there...
> > >
> > > For (2) the prior discussion seemed to kind of allude to this but I
> > can't
> > > really come up with a use case. Is there one?
> > >
> > > If it is just (1) I think the question is whether it really helps much
> > to
> > > have the metric on the client vs the server. I suppose this is a bit
> > > environment specific. If you have a central metrics system it shouldn't
> > > make any difference, but if you don't I suppose it does.
> > >
> > > -Jay
> > >
> > > On Mon, Apr 6, 2015 at 7:57 PM, Gwen Shapira <gshap...@cloudera.com>
> > > wrote:
> > >
> > > > Here's a wild guess:
> > > >
> > > > An app developer included a Kafka Producer in his app, and is not
> > happy
> > > > with the throughput. He doesn't have visibility into the brokers since
> > > they
> > > > are owned by a different team. Obviously the first instinct of a
> > > developer
> > > > who knows that throttling exists is to blame throttling for any
> > slowdown
> > > in
> > > > the app.
> > > > If he doesn't have a way to know from the responses whether or not his
> > > app
> > > > is throttled, he may end up calling Aditya at 4am asked "Hey, is my
> > app
> > > > throttled?".
> > > >
> > > > I assume Aditya is trying to avoid this scenario.
> > > >
> > > > On Mon, Apr 6, 2015 at 7:47 PM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >
> > > > > Hey Aditya,
> > > > >
> > > > > 2. I kind of buy it, but I really like to understand the details of
> > the
> > > > use
> > > > > case before we make protocol changes. What changes are you proposing
> > in
> > > > the
> > > > > clients for monitoring and how would that be used?
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Mon, Apr 6, 2015 at 10:36 AM, Aditya Auradkar <
> > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > Hi Jay,
> > > > > >
> > > > > > 2. At this time, the proposed response format changes are only for
> > > > > > monitoring/informing clients. As Jun mentioned, we get instance
> > level
> > > > > > monitoring in this case since each instance that got throttled
> > will
> > > > have
> > > > > a
> > > > > > metric confirming the same. Without client level monitoring for
> > this,
> > > > > it's
> > > > > > hard for application developers to find if they are being
> > throttled
> > > > since
> > > > > > they will also have to be aware of all the brokers in the cluster.
> > > This
> > > > > is
> > > > > > quite problematic for large clusters.
> > > > > >
> > > > > > It seems nice for app developers to not have to think about kafka
> > > > > internal
> > > > > > metrics and only focus on the metrics exposed on their instances.
> > > > > Analogous
> > > > > > to having client-sde request latency metrics. Basically, we want
> > an
> > > > easy
> > > > > > way for clients to be aware if they are being throttled.
> > > > > >
> > > > > > 4. For purgatory v delay queue, I think we are on the same page. I
> > > feel
> > > > > it
> > > > > > is nicer to use the purgatory but I'm happy to use a DelayQueue if
> > > > there
> > > > > > are performance implications. I don't know enough about the
> > current
> > > and
> > > > > > Yasuhiro's new implementation to be sure one way or the other.
> > > > > >
> > > > > > Stepping back, I think these two things are the only remaining
> > point
> > > of
> > > > > > discussion within the current proposal. Any concerns if I started
> > a
> > > > > voting
> > > > > > thread on the proposal after the KIP discussion tomorrow?
> > (assuming
> > > we
> > > > > > reach consensus on these items)
> > > > > >
> > > > > > Thanks,
> > > > > > Aditya
> > > > > > ________________________________________
> > > > > > From: Jay Kreps [jay.kr...@gmail.com]
> > > > > > Sent: Saturday, April 04, 2015 1:36 PM
> > > > > > To: dev@kafka.apache.org
> > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > >
> > > > > > Hey Aditya,
> > > > > >
> > > > > > 2. For the return flag I'm not terribly particular. If we want to
> > add
> > > > it
> > > > > > let's fully think through how it will be used. The only concern I
> > > have
> > > > is
> > > > > > adding to the protocol without really thinking through the use
> > cases.
> > > > So
> > > > > > let's work out the APIs we want to add to the Java consumer and
> > > > producer
> > > > > > and the use cases for how clients will make use of these. For my
> > > part I
> > > > > > actually don't see much use other than monitoring since it isn't
> > an
> > > > error
> > > > > > condition to be at your quota. And if it is just monitoring I
> > don't
> > > > see a
> > > > > > big enough difference between having the monitoring on the
> > > server-side
> > > > > > versus in the clients to justify putting it in the protocol. But I
> > > > think
> > > > > > you guys may have other use cases in mind of how a client would
> > make
> > > > some
> > > > > > use of this? Let's work that out. I also don't feel strongly about
> > > > it--it
> > > > > > wouldn't be *bad* to have the monitoring available on the client,
> > > just
> > > > > > doesn't seem that much better.
> > > > > >
> > > > > > 4. For the purgatory vs delay queue I think is arguably nicer to
> > > reuse
> > > > > the
> > > > > > purgatory we just have to be ultra-conscious of efficiency. I
> > think
> > > our
> > > > > > goal is to turn quotas on across the board, so at LinkedIn that
> > would
> > > > > mean
> > > > > > potentially every request will need a small delay. I haven't
> > worked
> > > out
> > > > > the
> > > > > > efficiency implications of this choice, so as long as we do that
> > I'm
> > > > > happy.
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > > On Fri, Apr 3, 2015 at 1:10 PM, Aditya Auradkar <
> > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > >
> > > > > > > Some responses to Jay's points.
> > > > > > >
> > > > > > > 1. Using commas - Cool.
> > > > > > >
> > > > > > > 2. Adding return flag - I'm inclined to agree with Joel that
> > this
> > > is
> > > > > good
> > > > > > > to have in the initial implementation.
> > > > > > >
> > > > > > > 3. Config - +1. I'll remove it from the KIP. We can discuss this
> > in
> > > > > > > parallel.
> > > > > > >
> > > > > > > 4. Purgatory vs Delay queue - I feel that it is simpler to reuse
> > > the
> > > > > > > existing purgatories for both delayed produce and fetch
> > requests.
> > > > IIUC,
> > > > > > all
> > > > > > > we need for quotas is a minWait parameter for DelayedOperation
> > (or
> > > > > > > something equivalent) since there is already a max wait. The
> > > > completion
> > > > > > > criteria can check if minWait time has elapsed before declaring
> > the
> > > > > > > operation complete. For this to impact performance, a
> > significant
> > > > > number
> > > > > > of
> > > > > > > clients may need to exceed their quota at the same time and even
> > > then
> > > > > I'm
> > > > > > > not very clear on the scope of the impact. Two layers of delays
> > > might
> > > > > add
> > > > > > > complexity to the implementation which I'm hoping to avoid.
> > > > > > >
> > > > > > > Aditya
> > > > > > >
> > > > > > > ________________________________________
> > > > > > > From: Joel Koshy [jjkosh...@gmail.com]
> > > > > > > Sent: Friday, April 03, 2015 12:48 PM
> > > > > > > To: dev@kafka.apache.org
> > > > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > >
> > > > > > > Aditya, thanks for the updated KIP and Jay/Jun thanks for the
> > > > > > > comments. Couple of comments in-line:
> > > > > > >
> > > > > > > > 2. I would advocate for adding the return flag when we next
> > bump
> > > > the
> > > > > > > > request format version just to avoid proliferation. I agree
> > this
> > > > is a
> > > > > > > good
> > > > > > > > thing to know about, but at the moment I don't think we have a
> > > very
> > > > > > well
> > > > > > > > flushed out idea of how the client would actually make use of
> > > this
> > > > > > info.
> > > > > > > I
> > > > > > >
> > > > > > > I'm somewhat inclined to having something appropriate off the
> > bat -
> > > > > > > mainly because (i) clients really should know that they have
> > been
> > > > > > > throttled (ii) a smart producer/consumer implementation would
> > want
> > > to
> > > > > > > know how much to back off. So perhaps this and config-management
> > > > > > > should be moved to a separate discussion, but it would be good
> > to
> > > > have
> > > > > > > this discussion going and incorporated into the first quota
> > > > > > > implementation.
> > > > > > >
> > > > > > > > 3. Config--I think we need to generalize the topic stuff so we
> > > can
> > > > > > > override
> > > > > > > > at multiple levels. We have topic and client, but I suspect
> > > "user"
> > > > > and
> > > > > > > > "broker" will also be important. I recommend we take config
> > stuff
> > > > out
> > > > > > of
> > > > > > > > this KIP since we really need to fully think through a
> > proposal
> > > > that
> > > > > > will
> > > > > > > > cover all these types of overrides.
> > > > > > >
> > > > > > > +1 - it is definitely orthogonal to the core quota
> > implementation
> > > > > > > (although necessary for its operability). Having a
> > config-related
> > > > > > > discussion in this KIP would only draw out the discussion and
> > vote
> > > > > > > even if the core quota design looks good to everyone.
> > > > > > >
> > > > > > > So basically I think we can remove the portions on dynamic
> > config
> > > as
> > > > > > > well as the response format but I really think we should close
> > on
> > > > > > > those while the implementation is in progress and before quotas
> > is
> > > > > > > officially released.
> > > > > > >
> > > > > > > > 4. Instead of using purgatories to implement the delay would
> > it
> > > > make
> > > > > > more
> > > > > > > > sense to just use a delay queue? I think all the additional
> > stuff
> > > > in
> > > > > > the
> > > > > > > > purgatory other than the delay queue doesn't make sense as the
> > > > quota
> > > > > > is a
> > > > > > > > hard N ms penalty with no chance of early eviction. If there
> > is
> > > no
> > > > > perf
> > > > > > > > penalty for the full purgatory that may be fine (even good) to
> > > > reuse,
> > > > > > > but I
> > > > > > > > haven't looked into that.
> > > > > > >
> > > > > > > A simple delay queue sounds good - I think Aditya was also
> > trying
> > > to
> > > > > > > avoid adding a new quota purgatory. i.e., it may be possible to
> > use
> > > > > > > the existing purgatory instances to enforce quotas. That may be
> > > > > > > simpler, but would be incur a slight perf penalty if too many
> > > clients
> > > > > > > are being throttled.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Joel
> > > > > > >
> > > > > > > >
> > > > > > > > -Jay
> > > > > > > >
> > > > > > > > On Fri, Apr 3, 2015 at 10:45 AM, Aditya Auradkar <
> > > > > > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > > > >
> > > > > > > >> Update, I added a proposal on doing dynamic client based
> > > > > configuration
> > > > > > > >> that can be used for quotas.
> > > > > > > >>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > > > >>
> > > > > > > >> Please take a look and let me know if there are any concerns.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Aditya
> > > > > > > >> ________________________________________
> > > > > > > >> From: Aditya Auradkar
> > > > > > > >> Sent: Friday, April 03, 2015 10:10 AM
> > > > > > > >> To: dev@kafka.apache.org
> > > > > > > >> Subject: RE: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > >>
> > > > > > > >> Thanks Jun.
> > > > > > > >>
> > > > > > > >> Some thoughts:
> > > > > > > >>
> > > > > > > >> 10) I think it is better we throttle regardless of the
> > > > produce/fetch
> > > > > > > >> version. This is a nice feature where clients can tell if
> > they
> > > are
> > > > > > being
> > > > > > > >> throttled or not. If we only throttle newer clients, then we
> > > have
> > > > > > > >> inconsistent behavior across clients in a multi-tenant
> > cluster.
> > > > > Having
> > > > > > > >> quota metrics on the client side is also a nice incentive to
> > > > upgrade
> > > > > > > client
> > > > > > > >> versions.
> > > > > > > >>
> > > > > > > >> 11) I think we can call metric.record(fetchSize) before
> > adding
> > > the
> > > > > > > >> delayedFetch request into the purgatory. This will give us
> > the
> > > > > > estimated
> > > > > > > >> delay of the request up-front. The timeout on the
> > DelayedFetch
> > > is
> > > > > the
> > > > > > > >> Max(maxWait, quotaDelay). The DelayedFetch completion
> > criteria
> > > can
> > > > > > > change a
> > > > > > > >> little to accomodate quotas.
> > > > > > > >>
> > > > > > > >> - I agree the quota code should return the estimated delay
> > time
> > > in
> > > > > > > >> QuotaViolationException.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Aditya
> > > > > > > >>
> > > > > > > >> ________________________________________
> > > > > > > >> From: Jun Rao [j...@confluent.io]
> > > > > > > >> Sent: Friday, April 03, 2015 9:16 AM
> > > > > > > >> To: dev@kafka.apache.org
> > > > > > > >> Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > >>
> > > > > > > >> Thanks for the update.
> > > > > > > >>
> > > > > > > >> 10. About whether to return a new field in the response to
> > > > indicate
> > > > > > > >> throttling. Earlier, the plan was to not change the response
> > > > format
> > > > > > and
> > > > > > > >> just have a metric on the broker to indicate whether a
> > clientId
> > > is
> > > > > > > >> throttled or not. The issue is that we don't know whether a
> > > > > particular
> > > > > > > >> clientId instance is throttled or not (since there could be
> > > > multiple
> > > > > > > >> clients with the same clientId). Your proposal of adding an
> > > > > > isThrottled
> > > > > > > >> field in the response addresses and seems better. Then, do we
> > > just
> > > > > > > throttle
> > > > > > > >> the new version of produce/fetch request or both the old and
> > the
> > > > new
> > > > > > > >> versions? Also, we probably still need a separate metric on
> > the
> > > > > broker
> > > > > > > side
> > > > > > > >> to indicate whether a clientId is throttled or not.
> > > > > > > >>
> > > > > > > >> 11. Just to clarify. For fetch requests, when will
> > > > > > > metric.record(fetchSize)
> > > > > > > >> be called? Is it when we are ready to send the fetch response
> > > > (after
> > > > > > > >> minBytes and maxWait are satisfied)?
> > > > > > > >>
> > > > > > > >> As an implementation detail, it may be useful for the quota
> > code
> > > > to
> > > > > > > return
> > > > > > > >> an estimated delay time (to bring the measurement within the
> > > > limit)
> > > > > in
> > > > > > > >> QuotaViolationException.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Jun
> > > > > > > >>
> > > > > > > >> On Wed, Apr 1, 2015 at 3:27 PM, Aditya Auradkar <
> > > > > > > >> aaurad...@linkedin.com.invalid> wrote:
> > > > > > > >>
> > > > > > > >> > Hey everyone,
> > > > > > > >> >
> > > > > > > >> > I've made changes to the KIP to capture our discussions
> > over
> > > the
> > > > > > last
> > > > > > > >> > couple of weeks.
> > > > > > > >> >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > > > >> >
> > > > > > > >> > I'll start a voting thread after people have had a chance
> > to
> > > > > > > >> read/comment.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Aditya
> > > > > > > >> >
> > > > > > > >> > ________________________________________
> > > > > > > >> > From: Steven Wu [stevenz...@gmail.com]
> > > > > > > >> > Sent: Friday, March 20, 2015 9:14 AM
> > > > > > > >> > To: dev@kafka.apache.org
> > > > > > > >> > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > >> >
> > > > > > > >> > +1 on Jun's suggestion of maintaining one set/style of
> > metrics
> > > > at
> > > > > > > broker.
> > > > > > > >> > In Netflix, we have to convert the yammer metrics to servo
> > > > metrics
> > > > > > at
> > > > > > > >> > broker. it will be painful to know some metrics are in a
> > > > different
> > > > > > > style
> > > > > > > >> > and get to be handled differently.
> > > > > > > >> >
> > > > > > > >> > On Fri, Mar 20, 2015 at 8:17 AM, Jun Rao <j...@confluent.io>
> >
> > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Not so sure. People who use quota will definitely want to
> > > > > monitor
> > > > > > > the
> > > > > > > >> new
> > > > > > > >> > > metrics at the client id level. Then they will need to
> > deal
> > > > with
> > > > > > > those
> > > > > > > >> > > metrics differently from the rest of the metrics. It
> > would
> > > be
> > > > > > > better if
> > > > > > > >> > we
> > > > > > > >> > > can hide this complexity from the users.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > >
> > > > > > > >> > > Jun
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Mar 19, 2015 at 10:45 PM, Joel Koshy <
> > > > > jjkosh...@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Actually thinking again - since these will be a few new
> > > > > metrics
> > > > > > at
> > > > > > > >> the
> > > > > > > >> > > > client id level (bytes in and bytes out to start with)
> > > maybe
> > > > > it
> > > > > > is
> > > > > > > >> fine
> > > > > > > >> > > to
> > > > > > > >> > > > have the two type of metrics coexist and we can migrate
> > > the
> > > > > > > existing
> > > > > > > >> > > > metrics in parallel.
> > > > > > > >> > > >
> > > > > > > >> > > > On Thursday, March 19, 2015, Joel Koshy <
> > > > jjkosh...@gmail.com>
> > > > > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > That is a valid concern but in that case I think it
> > > would
> > > > be
> > > > > > > better
> > > > > > > >> > to
> > > > > > > >> > > > > just migrate completely to the new metrics package
> > > first.
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Thursday, March 19, 2015, Jun Rao <
> > j...@confluent.io
> > > > > > > >> > > > > <javascript:_e(%7B%7D,'cvml','j...@confluent.io');>>
> > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > >> Hmm, I was thinking a bit differently on the metrics
> > > > > stuff. I
> > > > > > > >> think
> > > > > > > >> > it
> > > > > > > >> > > > >> would be confusing to have some metrics defined in
> > the
> > > > new
> > > > > > > metrics
> > > > > > > >> > > > package
> > > > > > > >> > > > >> while some others defined in Coda Hale. Those
> > metrics
> > > > will
> > > > > > look
> > > > > > > >> > > > different
> > > > > > > >> > > > >> (e.g., rates in Coda Hale will have special
> > attributes
> > > > such
> > > > > > as
> > > > > > > >> > > > >> 1-min-average). People may need different ways to
> > > export
> > > > > the
> > > > > > > >> metrics
> > > > > > > >> > > to
> > > > > > > >> > > > >> external systems such as Graphite. So, instead of
> > using
> > > > the
> > > > > > new
> > > > > > > >> > > metrics
> > > > > > > >> > > > >> package on the broker, I was thinking that we can
> > just
> > > > > > > implement a
> > > > > > > >> > > > >> QuotaMetrics that wraps the Coda Hale metrics. The
> > > > > > > implementation
> > > > > > > >> > can
> > > > > > > >> > > be
> > > > > > > >> > > > >> the same as what's in the new metrics package.
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> Thanks,
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> Jun
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> On Thu, Mar 19, 2015 at 8:09 PM, Jay Kreps <
> > > > > > > jay.kr...@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> > Yeah I was saying was that we are blocked on
> > picking
> > > an
> > > > > > > approach
> > > > > > > >> > for
> > > > > > > >> > > > >> > metrics but not necessarily the full conversion.
> > > > Clearly
> > > > > if
> > > > > > > we
> > > > > > > >> > pick
> > > > > > > >> > > > the
> > > > > > > >> > > > >> new
> > > > > > > >> > > > >> > metrics package we would need to implement the two
> > > > > metrics
> > > > > > we
> > > > > > > >> want
> > > > > > > >> > > to
> > > > > > > >> > > > >> quota
> > > > > > > >> > > > >> > on. But the conversion of the remaining metrics
> > can
> > > be
> > > > > done
> > > > > > > >> > > > >> asynchronously.
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > -Jay
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > On Thu, Mar 19, 2015 at 5:56 PM, Joel Koshy <
> > > > > > > >> jjkosh...@gmail.com>
> > > > > > > >> > > > >> wrote:
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't
> > > need
> > > > to
> > > > > > > block
> > > > > > > >> on
> > > > > > > >> > > the
> > > > > > > >> > > > >> > > > migration of the metrics package.
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > > Can you clarify the above? i.e., if we are going
> > to
> > > > > quota
> > > > > > > on
> > > > > > > >> > > > something
> > > > > > > >> > > > >> > > then we would want to have migrated that metric
> > > over
> > > > > > > right? Or
> > > > > > > >> > do
> > > > > > > >> > > > you
> > > > > > > >> > > > >> > > mean we don't need to complete the migration of
> > all
> > > > > > > metrics to
> > > > > > > >> > the
> > > > > > > >> > > > >> > > metrics package right?
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > > I think most of us now feel that the delay + no
> > > error
> > > > > is
> > > > > > a
> > > > > > > >> good
> > > > > > > >> > > > >> > > approach, but it would be good to make sure
> > > everyone
> > > > is
> > > > > > on
> > > > > > > the
> > > > > > > >> > > same
> > > > > > > >> > > > >> > > page.
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > > As Aditya requested a couple of days ago I think
> > we
> > > > > > should
> > > > > > > go
> > > > > > > >> > over
> > > > > > > >> > > > >> > > this at the next KIP hangout.
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > > Joel
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > > On Thu, Mar 19, 2015 at 09:24:09AM -0700, Jun
> > Rao
> > > > > wrote:
> > > > > > > >> > > > >> > > > 1. Delay + no error seems reasonable to me.
> > > > However,
> > > > > I
> > > > > > do
> > > > > > > >> feel
> > > > > > > >> > > > that
> > > > > > > >> > > > >> we
> > > > > > > >> > > > >> > > need
> > > > > > > >> > > > >> > > > to give the client an indicator that it's
> > being
> > > > > > > throttled,
> > > > > > > >> > > instead
> > > > > > > >> > > > >> of
> > > > > > > >> > > > >> > > doing
> > > > > > > >> > > > >> > > > this silently. For that, we probably need to
> > > evolve
> > > > > the
> > > > > > > >> > > > >> produce/fetch
> > > > > > > >> > > > >> > > > protocol to include an extra status field in
> > the
> > > > > > > response.
> > > > > > > >> We
> > > > > > > >> > > > >> probably
> > > > > > > >> > > > >> > > need
> > > > > > > >> > > > >> > > > to think more about whether we just want to
> > > return
> > > > a
> > > > > > > simple
> > > > > > > >> > > status
> > > > > > > >> > > > >> code
> > > > > > > >> > > > >> > > > (e.g., 1 = throttled) or a value that
> > indicates
> > > how
> > > > > > much
> > > > > > > is
> > > > > > > >> > > being
> > > > > > > >> > > > >> > > throttled.
> > > > > > > >> > > > >> > > >
> > > > > > > >> > > > >> > > > 2. We probably need to improve the histogram
> > > > support
> > > > > in
> > > > > > > the
> > > > > > > >> > new
> > > > > > > >> > > > >> metrics
> > > > > > > >> > > > >> > > > package before we can use it more widely on
> > the
> > > > > server
> > > > > > > side
> > > > > > > >> > > (left
> > > > > > > >> > > > a
> > > > > > > >> > > > >> > > comment
> > > > > > > >> > > > >> > > > in KAFKA-1930). I agree that this KIP doesn't
> > > need
> > > > to
> > > > > > > block
> > > > > > > >> on
> > > > > > > >> > > the
> > > > > > > >> > > > >> > > > migration of the metrics package.
> > > > > > > >> > > > >> > > >
> > > > > > > >> > > > >> > > > Thanks,
> > > > > > > >> > > > >> > > >
> > > > > > > >> > > > >> > > > Jun
> > > > > > > >> > > > >> > > >
> > > > > > > >> > > > >> > > > On Wed, Mar 18, 2015 at 4:02 PM, Aditya
> > Auradkar
> > > <
> > > > > > > >> > > > >> > > > aaurad...@linkedin.com.invalid> wrote:
> > > > > > > >> > > > >> > > >
> > > > > > > >> > > > >> > > > > Hey everyone,
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > Thanks for the great discussion. There are
> > > > > currently
> > > > > > a
> > > > > > > few
> > > > > > > >> > > > points
> > > > > > > >> > > > >> on
> > > > > > > >> > > > >> > > this
> > > > > > > >> > > > >> > > > > KIP that need addressing and I want to make
> > > sure
> > > > we
> > > > > > > are on
> > > > > > > >> > the
> > > > > > > >> > > > >> same
> > > > > > > >> > > > >> > > page
> > > > > > > >> > > > >> > > > > about those.
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > 1. Append and delay response vs delay and
> > > return
> > > > > > error
> > > > > > > >> > > > >> > > > > - I think we've discussed the pros and cons
> > of
> > > > each
> > > > > > > >> approach
> > > > > > > >> > > but
> > > > > > > >> > > > >> > > haven't
> > > > > > > >> > > > >> > > > > chosen an approach yet. Where does everyone
> > > stand
> > > > > on
> > > > > > > this
> > > > > > > >> > > issue?
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > 2. Metrics Migration and usage in quotas
> > > > > > > >> > > > >> > > > > - The metrics library in clients has a
> > notion
> > > of
> > > > > > quotas
> > > > > > > >> that
> > > > > > > >> > > we
> > > > > > > >> > > > >> > should
> > > > > > > >> > > > >> > > > > reuse. For that to happen, we need to
> > migrate
> > > the
> > > > > > > server
> > > > > > > >> to
> > > > > > > >> > > the
> > > > > > > >> > > > >> new
> > > > > > > >> > > > >> > > metrics
> > > > > > > >> > > > >> > > > > package.
> > > > > > > >> > > > >> > > > > - Need more clarification on how to compute
> > > > > > throttling
> > > > > > > >> time
> > > > > > > >> > > and
> > > > > > > >> > > > >> > > windowing
> > > > > > > >> > > > >> > > > > for quotas.
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > I'm going to start a new KIP to discuss
> > metrics
> > > > > > > migration
> > > > > > > >> > > > >> separately.
> > > > > > > >> > > > >> > > That
> > > > > > > >> > > > >> > > > > will also contain a section on quotas.
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > 3. Dynamic Configuration management - Being
> > > > > discussed
> > > > > > > in
> > > > > > > >> > > KIP-5.
> > > > > > > >> > > > >> > > Basically
> > > > > > > >> > > > >> > > > > we need something that will model default
> > > quotas
> > > > > and
> > > > > > > allow
> > > > > > > >> > > > >> per-client
> > > > > > > >> > > > >> > > > > overrides.
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > Is there something else that I'm missing?
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > Thanks,
> > > > > > > >> > > > >> > > > > Aditya
> > > > > > > >> > > > >> > > > > ________________________________________
> > > > > > > >> > > > >> > > > > From: Jay Kreps [jay.kr...@gmail.com]
> > > > > > > >> > > > >> > > > > Sent: Wednesday, March 18, 2015 2:10 PM
> > > > > > > >> > > > >> > > > > To: dev@kafka.apache.org
> > > > > > > >> > > > >> > > > > Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > Hey Steven,
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > The current proposal is actually to enforce
> > > > quotas
> > > > > at
> > > > > > > the
> > > > > > > >> > > > >> > > > > client/application level, NOT the topic
> > level.
> > > So
> > > > > if
> > > > > > > you
> > > > > > > >> > have
> > > > > > > >> > > a
> > > > > > > >> > > > >> > service
> > > > > > > >> > > > >> > > > > with a few dozen instances the quota is
> > against
> > > > all
> > > > > > of
> > > > > > > >> those
> > > > > > > >> > > > >> > instances
> > > > > > > >> > > > >> > > > > added up across all their topics. So
> > actually
> > > the
> > > > > > > effect
> > > > > > > >> > would
> > > > > > > >> > > > be
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > same
> > > > > > > >> > > > >> > > > > either way but throttling gives the producer
> > > the
> > > > > > > choice of
> > > > > > > >> > > > either
> > > > > > > >> > > > >> > > blocking
> > > > > > > >> > > > >> > > > > or dropping.
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > -Jay
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > On Tue, Mar 17, 2015 at 10:08 AM, Steven Wu
> > <
> > > > > > > >> > > > stevenz...@gmail.com
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > > wrote:
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > > > > > Jay,
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > > let's say an app produces to 10 different
> > > > topics.
> > > > > > > one of
> > > > > > > >> > the
> > > > > > > >> > > > >> topic
> > > > > > > >> > > > >> > is
> > > > > > > >> > > > >> > > > > sent
> > > > > > > >> > > > >> > > > > > from a library. due to whatever
> > > condition/bug,
> > > > > this
> > > > > > > lib
> > > > > > > >> > > starts
> > > > > > > >> > > > >> to
> > > > > > > >> > > > >> > > send
> > > > > > > >> > > > >> > > > > > messages over the quota. if we go with the
> > > > > delayed
> > > > > > > >> > response
> > > > > > > >> > > > >> > > approach, it
> > > > > > > >> > > > >> > > > > > will cause the whole shared
> > RecordAccumulator
> > > > > > buffer
> > > > > > > to
> > > > > > > >> be
> > > > > > > >> > > > >> filled
> > > > > > > >> > > > >> > up.
> > > > > > > >> > > > >> > > > > that
> > > > > > > >> > > > >> > > > > > will penalize other 9 topics who are
> > within
> > > the
> > > > > > > quota.
> > > > > > > >> > that
> > > > > > > >> > > is
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > > > > unfairness point that Ewen and I were
> > trying
> > > to
> > > > > > make.
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > > if broker just drop the msg and return an
> > > > > > > error/status
> > > > > > > >> > code
> > > > > > > >> > > > >> > > indicates the
> > > > > > > >> > > > >> > > > > > drop and why. then producer can just move
> > on
> > > > and
> > > > > > > accept
> > > > > > > >> > the
> > > > > > > >> > > > >> drop.
> > > > > > > >> > > > >> > > shared
> > > > > > > >> > > > >> > > > > > buffer won't be saturated and other 9
> > topics
> > > > > won't
> > > > > > be
> > > > > > > >> > > > penalized.
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > > Thanks,
> > > > > > > >> > > > >> > > > > > Steven
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > > On Tue, Mar 17, 2015 at 9:44 AM, Jay Kreps
> > <
> > > > > > > >> > > > jay.kr...@gmail.com
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >> > > wrote:
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > > > > Hey Steven,
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > > > It is true that hitting the quota will
> > > cause
> > > > > > > >> > back-pressure
> > > > > > > >> > > > on
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > > > > producer.
> > > > > > > >> > > > >> > > > > > > But the solution is simple, a producer
> > that
> > > > > wants
> > > > > > > to
> > > > > > > >> > avoid
> > > > > > > >> > > > >> this
> > > > > > > >> > > > >> > > should
> > > > > > > >> > > > >> > > > > > stay
> > > > > > > >> > > > >> > > > > > > under its quota. In other words this is
> > a
> > > > > > contract
> > > > > > > >> > between
> > > > > > > >> > > > the
> > > > > > > >> > > > >> > > cluster
> > > > > > > >> > > > >> > > > > > and
> > > > > > > >> > > > >> > > > > > > the client, with each side having
> > something
> > > > to
> > > > > > > uphold.
> > > > > > > >> > > Quite
> > > > > > > >> > > > >> > > possibly
> > > > > > > >> > > > >> > > > > the
> > > > > > > >> > > > >> > > > > > > same thing will happen in the absence of
> > a
> > > > > > quota, a
> > > > > > > >> > client
> > > > > > > >> > > > >> that
> > > > > > > >> > > > >> > > > > produces
> > > > > > > >> > > > >> > > > > > an
> > > > > > > >> > > > >> > > > > > > unexpected amount of load will hit the
> > > limits
> > > > > of
> > > > > > > the
> > > > > > > >> > > server
> > > > > > > >> > > > >> and
> > > > > > > >> > > > >> > > > > > experience
> > > > > > > >> > > > >> > > > > > > backpressure. Quotas just allow you to
> > set
> > > > that
> > > > > > > same
> > > > > > > >> > limit
> > > > > > > >> > > > at
> > > > > > > >> > > > >> > > something
> > > > > > > >> > > > >> > > > > > > lower than 100% of all resources on the
> > > > server,
> > > > > > > which
> > > > > > > >> is
> > > > > > > >> > > > >> useful
> > > > > > > >> > > > >> > > for a
> > > > > > > >> > > > >> > > > > > > shared cluster.
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > > > -Jay
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > > > On Mon, Mar 16, 2015 at 11:34 PM, Steven
> > > Wu <
> > > > > > > >> > > > >> > stevenz...@gmail.com>
> > > > > > > >> > > > >> > > > > > wrote:
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > > > > wait. we create one kafka producer for
> > > each
> > > > > > > cluster.
> > > > > > > >> > > each
> > > > > > > >> > > > >> > > cluster can
> > > > > > > >> > > > >> > > > > > > have
> > > > > > > >> > > > >> > > > > > > > many topics. if producer buffer got
> > > filled
> > > > up
> > > > > > > due to
> > > > > > > >> > > > delayed
> > > > > > > >> > > > >> > > response
> > > > > > > >> > > > >> > > > > > for
> > > > > > > >> > > > >> > > > > > > > one throttled topic, won't that
> > penalize
> > > > > other
> > > > > > > >> topics
> > > > > > > >> > > > >> unfairly?
> > > > > > > >> > > > >> > > it
> > > > > > > >> > > > >> > > > > > seems
> > > > > > > >> > > > >> > > > > > > to
> > > > > > > >> > > > >> > > > > > > > me that broker should just return
> > error
> > > > > without
> > > > > > > >> delay.
> > > > > > > >> > > > >> > > > > > > >
> > > > > > > >> > > > >> > > > > > > > sorry that I am chatting to myself :)
> > > > > > > >> > > > >> > > > > > > >
> > > > > > > >> > > > >> > > > > > > > On Mon, Mar 16, 2015 at 11:29 PM,
> > Steven
> > > > Wu <
> > > > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > > > >> > > > >> > > > > > > wrote:
> > > > > > > >> > > > >> > > > > > > >
> > > > > > > >> > > > >> > > > > > > > > I think I can answer my own
> > question.
> > > > > delayed
> > > > > > > >> > response
> > > > > > > >> > > > >> will
> > > > > > > >> > > > >> > > cause
> > > > > > > >> > > > >> > > > > the
> > > > > > > >> > > > >> > > > > > > > > producer buffer to be full, which
> > then
> > > > > result
> > > > > > > in
> > > > > > > >> > > either
> > > > > > > >> > > > >> > thread
> > > > > > > >> > > > >> > > > > > blocking
> > > > > > > >> > > > >> > > > > > > > or
> > > > > > > >> > > > >> > > > > > > > > message drop.
> > > > > > > >> > > > >> > > > > > > > >
> > > > > > > >> > > > >> > > > > > > > > On Mon, Mar 16, 2015 at 11:24 PM,
> > > Steven
> > > > > Wu <
> > > > > > > >> > > > >> > > stevenz...@gmail.com>
> > > > > > > >> > > > >> > > > > > > > wrote:
> > > > > > > >> > > > >> > > > > > > > >
> > > > > > > >> > > > >> > > > > > > > >> please correct me if I am missing
> > sth
> > > > > here.
> > > > > > I
> > > > > > > am
> > > > > > > >> > not
> > > > > > > >> > > > >> > > understanding
> > > > > > > >> > > > >> > > > > > how
> > > > > > > >> > > > >> > > > > > > > >> would throttle work without
> > > > > > > cooperation/back-off
> > > > > > > >> > from
> > > > > > > >> > > > >> > > producer.
> > > > > > > >> > > > >> > > > > new
> > > > > > > >> > > > >> > > > > > > Java
> > > > > > > >> > > > >> > > > > > > > >> producer supports non-blocking API.
> > > why
> > > > > > would
> > > > > > > >> > delayed
> > > > > > > >> > > > >> > > response be
> > > > > > > >> > > > >> > > > > > able
> > > > > > > >> > > > >> > > > > > > > to
> > > > > > > >> > > > >> > > > > > > > >> slow down producer? producer will
> > > > continue
> > > > > > to
> > > > > > > >> fire
> > > > > > > >> > > > async
> > > > > > > >> > > > >> > > sends.
> > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > >> > > > >> > > > > > > > >> On Mon, Mar 16, 2015 at 10:58 PM,
> > > > Guozhang
> > > > > > > Wang <
> > > > > > > >> > > > >> > > > > wangg...@gmail.com
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > > > > >> wrote:
> > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > >> > > > >> > > > > > > > >>> I think we are really discussing
> > two
> > > > > > separate
> > > > > > > >> > issues
> > > > > > > >> > > > >> here:
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> 1. Whether we should a)
> > > > > > > >> > > > >> > > > > append-then-block-then-returnOKButThrottled
> > > > > > > >> > > > >> > > > > > > or
> > > > > > > >> > > > >> > > > > > > > b)
> > > > > > > >> > > > >> > > > > > > > >>>
> > block-then-returnFailDuetoThrottled
> > > for
> > > > > > quota
> > > > > > > >> > > actions
> > > > > > > >> > > > on
> > > > > > > >> > > > >> > > produce
> > > > > > > >> > > > >> > > > > > > > >>> requests.
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> Both these approaches assume some
> > > kind
> > > > of
> > > > > > > >> > > > >> well-behaveness
> > > > > > > >> > > > >> > of
> > > > > > > >> > > > >> > > the
> > > > > > > >> > > > >> > > > > > > > clients:
> > > > > > > >> > > > >> > > > > > > > >>> option a) assumes the client sets
> > an
> > > > > proper
> > > > > > > >> > timeout
> > > > > > > >> > > > >> value
> > > > > > > >> > > > >> > > while
> > > > > > > >> > > > >> > > > > can
> > > > > > > >> > > > >> > > > > > > > just
> > > > > > > >> > > > >> > > > > > > > >>> ignore "OKButThrottled" response,
> > > while
> > > > > > > option
> > > > > > > >> b)
> > > > > > > >> > > > >> assumes
> > > > > > > >> > > > >> > the
> > > > > > > >> > > > >> > > > > > client
> > > > > > > >> > > > >> > > > > > > > >>> handles the "FailDuetoThrottled"
> > > > > > > appropriately.
> > > > > > > >> > For
> > > > > > > >> > > > any
> > > > > > > >> > > > >> > > malicious
> > > > > > > >> > > > >> > > > > > > > clients
> > > > > > > >> > > > >> > > > > > > > >>> that, for example, just keep
> > retrying
> > > > > > either
> > > > > > > >> > > > >> intentionally
> > > > > > > >> > > > >> > or
> > > > > > > >> > > > >> > > > > not,
> > > > > > > >> > > > >> > > > > > > > >>> neither
> > > > > > > >> > > > >> > > > > > > > >>> of these approaches are actually
> > > > > effective.
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> 2. For "OKButThrottled" and
> > > > > > > "FailDuetoThrottled"
> > > > > > > >> > > > >> responses,
> > > > > > > >> > > > >> > > shall
> > > > > > > >> > > > >> > > > > > we
> > > > > > > >> > > > >> > > > > > > > >>> encode
> > > > > > > >> > > > >> > > > > > > > >>> them as error codes or augment the
> > > > > protocol
> > > > > > > to
> > > > > > > >> > use a
> > > > > > > >> > > > >> > separate
> > > > > > > >> > > > >> > > > > field
> > > > > > > >> > > > >> > > > > > > > >>> indicating "status codes".
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> Today we have already incorporated
> > > some
> > > > > > > status
> > > > > > > >> > code
> > > > > > > >> > > as
> > > > > > > >> > > > >> > error
> > > > > > > >> > > > >> > > > > codes
> > > > > > > >> > > > >> > > > > > in
> > > > > > > >> > > > >> > > > > > > > the
> > > > > > > >> > > > >> > > > > > > > >>> responses, e.g.
> > ReplicaNotAvailable
> > > in
> > > > > > > >> > > > MetadataResponse,
> > > > > > > >> > > > >> > the
> > > > > > > >> > > > >> > > pros
> > > > > > > >> > > > >> > > > > > of
> > > > > > > >> > > > >> > > > > > > > this
> > > > > > > >> > > > >> > > > > > > > >>> is of course using a single field
> > for
> > > > > > > response
> > > > > > > >> > > status
> > > > > > > >> > > > >> like
> > > > > > > >> > > > >> > > the
> > > > > > > >> > > > >> > > > > HTTP
> > > > > > > >> > > > >> > > > > > > > >>> status
> > > > > > > >> > > > >> > > > > > > > >>> codes, while the cons is that it
> > > > requires
> > > > > > > >> clients
> > > > > > > >> > to
> > > > > > > >> > > > >> handle
> > > > > > > >> > > > >> > > the
> > > > > > > >> > > > >> > > > > > error
> > > > > > > >> > > > >> > > > > > > > >>> codes
> > > > > > > >> > > > >> > > > > > > > >>> carefully.
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> I think maybe we can actually
> > extend
> > > > the
> > > > > > > >> > single-code
> > > > > > > >> > > > >> > > approach to
> > > > > > > >> > > > >> > > > > > > > overcome
> > > > > > > >> > > > >> > > > > > > > >>> its drawbacks, that is, wrap the
> > > error
> > > > > > codes
> > > > > > > >> > > semantics
> > > > > > > >> > > > >> to
> > > > > > > >> > > > >> > the
> > > > > > > >> > > > >> > > > > users
> > > > > > > >> > > > >> > > > > > > so
> > > > > > > >> > > > >> > > > > > > > >>> that
> > > > > > > >> > > > >> > > > > > > > >>> users do not need to handle the
> > codes
> > > > > > > >> one-by-one.
> > > > > > > >> > > More
> > > > > > > >> > > > >> > > > > concretely,
> > > > > > > >> > > > >> > > > > > > > >>> following Jay's example the client
> > > > could
> > > > > > > write
> > > > > > > >> > sth.
> > > > > > > >> > > > like
> > > > > > > >> > > > >> > > this:
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> if(error.isOK())
> > > > > > > >> > > > >> > > > > > > > >>> // status code is good or the
> > > code
> > > > > can
> > > > > > > be
> > > > > > > >> > > simply
> > > > > > > >> > > > >> > > ignored for
> > > > > > > >> > > > >> > > > > > > this
> > > > > > > >> > > > >> > > > > > > > >>> request type, process the request
> > > > > > > >> > > > >> > > > > > > > >>> else if(error.needsRetry())
> > > > > > > >> > > > >> > > > > > > > >>> // throttled, transient error,
> > > > etc:
> > > > > > > retry
> > > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal())
> > > > > > > >> > > > >> > > > > > > > >>> // non-retriable errors, etc:
> > > > > notify /
> > > > > > > >> > > terminate
> > > > > > > >> > > > /
> > > > > > > >> > > > >> > other
> > > > > > > >> > > > >> > > > > > > handling
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> Only when the clients really want
> > to
> > > > > > handle,
> > > > > > > for
> > > > > > > >> > > > example
> > > > > > > >> > > > >> > > > > > > > >>> FailDuetoThrottled
> > > > > > > >> > > > >> > > > > > > > >>> status code specifically, it needs
> > > to:
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> if(error.isOK())
> > > > > > > >> > > > >> > > > > > > > >>> // status code is good or the
> > > code
> > > > > can
> > > > > > > be
> > > > > > > >> > > simply
> > > > > > > >> > > > >> > > ignored for
> > > > > > > >> > > > >> > > > > > > this
> > > > > > > >> > > > >> > > > > > > > >>> request type, process the request
> > > > > > > >> > > > >> > > > > > > > >>> else if(error ==
> > > FailDuetoThrottled )
> > > > > > > >> > > > >> > > > > > > > >>> // throttled: log it
> > > > > > > >> > > > >> > > > > > > > >>> else if(error.needsRetry())
> > > > > > > >> > > > >> > > > > > > > >>> // transient error, etc: retry
> > > > > > > >> > > > >> > > > > > > > >>> else if(error.isFatal())
> > > > > > > >> > > > >> > > > > > > > >>> // non-retriable errors, etc:
> > > > > notify /
> > > > > > > >> > > terminate
> > > > > > > >> > > > /
> > > > > > > >> > > > >> > other
> > > > > > > >> > > > >> > > > > > > handling
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> -----------------
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> And for implementation we can
> > > probably
> > > > > > group
> > > > > > > the
> > > > > > > >> > > codes
> > > > > > > >> > > > >> > > > > accordingly
> > > > > > > >> > > > >> > > > > > > like
> > > > > > > >> > > > >> > > > > > > > >>> HTTP status code such that we can
> > do:
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> boolean Error.isOK() {
> > > > > > > >> > > > >> > > > > > > > >>> return code < 300 && code >= 200;
> > > > > > > >> > > > >> > > > > > > > >>> }
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> Guozhang
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> On Mon, Mar 16, 2015 at 10:24 PM,
> > > Ewen
> > > > > > > >> > > > Cheslack-Postava
> > > > > > > >> > > > >> <
> > > > > > > >> > > > >> > > > > > > > >>> e...@confluent.io>
> > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> > Agreed that trying to shoehorn
> > > > > non-error
> > > > > > > codes
> > > > > > > >> > > into
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > error
> > > > > > > >> > > > >> > > > > > field
> > > > > > > >> > > > >> > > > > > > > is
> > > > > > > >> > > > >> > > > > > > > >>> a
> > > > > > > >> > > > >> > > > > > > > >>> > bad idea. It makes it *way* too
> > > easy
> > > > to
> > > > > > > write
> > > > > > > >> > code
> > > > > > > >> > > > >> that
> > > > > > > >> > > > >> > > looks
> > > > > > > >> > > > >> > > > > > (and
> > > > > > > >> > > > >> > > > > > > > >>> should
> > > > > > > >> > > > >> > > > > > > > >>> > be) correct but is actually
> > > > incorrect.
> > > > > If
> > > > > > > >> > > > necessary, I
> > > > > > > >> > > > >> > > think
> > > > > > > >> > > > >> > > > > it's
> > > > > > > >> > > > >> > > > > > > > much
> > > > > > > >> > > > >> > > > > > > > >>> > better to to spend a couple of
> > > extra
> > > > > > bytes
> > > > > > > to
> > > > > > > >> > > encode
> > > > > > > >> > > > >> that
> > > > > > > >> > > > >> > > > > > > information
> > > > > > > >> > > > >> > > > > > > > >>> > separately (a "status" or
> > "warning"
> > > > > > > section of
> > > > > > > >> > the
> > > > > > > >> > > > >> > > response).
> > > > > > > >> > > > >> > > > > An
> > > > > > > >> > > > >> > > > > > > > >>> indication
> > > > > > > >> > > > >> > > > > > > > >>> > that throttling is occurring is
> > > > > something
> > > > > > > I'd
> > > > > > > >> > > expect
> > > > > > > >> > > > >> to
> > > > > > > >> > > > >> > be
> > > > > > > >> > > > >> > > > > > > indicated
> > > > > > > >> > > > >> > > > > > > > >>> by a
> > > > > > > >> > > > >> > > > > > > > >>> > bit flag in the response rather
> > > than
> > > > as
> > > > > > an
> > > > > > > >> error
> > > > > > > >> > > > code.
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> > Gwen - I think an error code
> > makes
> > > > > sense
> > > > > > > when
> > > > > > > >> > the
> > > > > > > >> > > > >> request
> > > > > > > >> > > > >> > > > > > actually
> > > > > > > >> > > > >> > > > > > > > >>> failed.
> > > > > > > >> > > > >> > > > > > > > >>> > Option B, which Jun was
> > advocating,
> > > > > would
> > > > > > > have
> > > > > > > >> > > > >> appended
> > > > > > > >> > > > >> > the
> > > > > > > >> > > > >> > > > > > > messages
> > > > > > > >> > > > >> > > > > > > > >>> > successfully. If the
> > rate-limiting
> > > > case
> > > > > > > you're
> > > > > > > >> > > > talking
> > > > > > > >> > > > >> > > about
> > > > > > > >> > > > >> > > > > had
> > > > > > > >> > > > >> > > > > > > > >>> > successfully committed the
> > > messages,
> > > > I
> > > > > > > would
> > > > > > > >> say
> > > > > > > >> > > > >> that's
> > > > > > > >> > > > >> > > also a
> > > > > > > >> > > > >> > > > > > bad
> > > > > > > >> > > > >> > > > > > > > use
> > > > > > > >> > > > >> > > > > > > > >>> of
> > > > > > > >> > > > >> > > > > > > > >>> > error codes.
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> > On Mon, Mar 16, 2015 at 10:16
> > PM,
> > > > Gwen
> > > > > > > >> Shapira <
> > > > > > > >> > > > >> > > > > > > > gshap...@cloudera.com>
> > > > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> > > We discussed an error code for
> > > > > > > rate-limiting
> > > > > > > >> > > > (which
> > > > > > > >> > > > >> I
> > > > > > > >> > > > >> > > think
> > > > > > > >> > > > >> > > > > > made
> > > > > > > >> > > > >> > > > > > > > >>> > > sense), isn't it a similar
> > case?
> > > > > > > >> > > > >> > > > > > > > >>> > >
> > > > > > > >> > > > >> > > > > > > > >>> > > On Mon, Mar 16, 2015 at 10:10
> > PM,
> > > > Jay
> > > > > > > Kreps
> > > > > > > >> <
> > > > > > > >> > > > >> > > > > > jay.kr...@gmail.com
> > > > > > > >> > > > >> > > > > > > >
> > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > >> > > > >> > > > > > > > >>> > > > My concern is that as soon
> > as
> > > you
> > > > > > start
> > > > > > > >> > > encoding
> > > > > > > >> > > > >> > > non-error
> > > > > > > >> > > > >> > > > > > > > response
> > > > > > > >> > > > >> > > > > > > > >>> > > > information into error codes
> > > the
> > > > > next
> > > > > > > >> > question
> > > > > > > >> > > > is
> > > > > > > >> > > > >> > what
> > > > > > > >> > > > >> > > to
> > > > > > > >> > > > >> > > > > do
> > > > > > > >> > > > >> > > > > > if
> > > > > > > >> > > > >> > > > > > > > two
> > > > > > > >> > > > >> > > > > > > > >>> > such
> > > > > > > >> > > > >> > > > > > > > >>> > > > codes apply (i.e. you have a
> > > > > replica
> > > > > > > down
> > > > > > > >> > and
> > > > > > > >> > > > the
> > > > > > > >> > > > >> > > response
> > > > > > > >> > > > >> > > > > is
> > > > > > > >> > > > >> > > > > > > > >>> > quota'd). I
> > > > > > > >> > > > >> > > > > > > > >>> > > > think I am trying to argue
> > that
> > > > > error
> > > > > > > >> should
> > > > > > > >> > > > mean
> > > > > > > >> > > > >> > "why
> > > > > > > >> > > > >> > > we
> > > > > > > >> > > > >> > > > > > > failed
> > > > > > > >> > > > >> > > > > > > > >>> your
> > > > > > > >> > > > >> > > > > > > > >>> > > > request", for which there
> > will
> > > > > really
> > > > > > > only
> > > > > > > >> > be
> > > > > > > >> > > > one
> > > > > > > >> > > > >> > > reason,
> > > > > > > >> > > > >> > > > > and
> > > > > > > >> > > > >> > > > > > > any
> > > > > > > >> > > > >> > > > > > > > >>> other
> > > > > > > >> > > > >> > > > > > > > >>> > > > useful information we want
> > to
> > > > send
> > > > > > > back is
> > > > > > > >> > > just
> > > > > > > >> > > > >> > another
> > > > > > > >> > > > >> > > > > field
> > > > > > > >> > > > >> > > > > > > in
> > > > > > > >> > > > >> > > > > > > > >>> the
> > > > > > > >> > > > >> > > > > > > > >>> > > > response.
> > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > >> > > > >> > > > > > > > >>> > > > -Jay
> > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > >> > > > >> > > > > > > > >>> > > > On Mon, Mar 16, 2015 at 9:51
> > > PM,
> > > > > Gwen
> > > > > > > >> > Shapira
> > > > > > > >> > > <
> > > > > > > >> > > > >> > > > > > > > >>> gshap...@cloudera.com>
> > > > > > > >> > > > >> > > > > > > > >>> > > wrote:
> > > > > > > >> > > > >> > > > > > > > >>> > > >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> I think its not too late to
> > > > > reserve
> > > > > > a
> > > > > > > set
> > > > > > > >> > of
> > > > > > > >> > > > >> error
> > > > > > > >> > > > >> > > codes
> > > > > > > >> > > > >> > > > > > > > >>> (200-299?)
> > > > > > > >> > > > >> > > > > > > > >>> > > >> for "non-error" codes.
> > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> It won't be backward
> > > compatible
> > > > > > (i.e.
> > > > > > > >> > clients
> > > > > > > >> > > > >> that
> > > > > > > >> > > > >> > > > > currently
> > > > > > > >> > > > >> > > > > > > do
> > > > > > > >> > > > >> > > > > > > > >>> "else
> > > > > > > >> > > > >> > > > > > > > >>> > > >> throw" will throw on
> > > > non-errors),
> > > > > > but
> > > > > > > >> > perhaps
> > > > > > > >> > > > its
> > > > > > > >> > > > >> > > > > > worthwhile.
> > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> On Mon, Mar 16, 2015 at
> > 9:42
> > > PM,
> > > > > Jay
> > > > > > > >> Kreps
> > > > > > > >> > <
> > > > > > > >> > > > >> > > > > > > jay.kr...@gmail.com
> > > > > > > >> > > > >> > > > > > > > >
> > > > > > > >> > > > >> > > > > > > > >>> > wrote:
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > Hey Jun,
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > I'd really really really
> > > like
> > > > to
> > > > > > > avoid
> > > > > > > >> > > that.
> > > > > > > >> > > > >> > Having
> > > > > > > >> > > > >> > > just
> > > > > > > >> > > > >> > > > > > > > spent a
> > > > > > > >> > > > >> > > > > > > > >>> > > bunch of
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > time on the clients,
> > using
> > > the
> > > > > > error
> > > > > > > >> > codes
> > > > > > > >> > > to
> > > > > > > >> > > > >> > encode
> > > > > > > >> > > > >> > > > > other
> > > > > > > >> > > > >> > > > > > > > >>> > information
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > about the response is
> > super
> > > > > > > dangerous.
> > > > > > > >> > The
> > > > > > > >> > > > >> error
> > > > > > > >> > > > >> > > > > handling
> > > > > > > >> > > > >> > > > > > is
> > > > > > > >> > > > >> > > > > > > > >>> one of
> > > > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > hardest parts of the
> > client
> > > > > > > (Guozhang
> > > > > > > >> > chime
> > > > > > > >> > > > in
> > > > > > > >> > > > >> > > here).
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > Generally the error
> > handling
> > > > > looks
> > > > > > > like
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > if(error == none)
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > // good, process the
> > > > > request
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error ==
> > > > > KNOWN_ERROR_1)
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle known error 1
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > else if(error ==
> > > > > KNOWN_ERROR_2)
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > // handle known error 2
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > else
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > throw
> > > > > > > >> > > Errors.forCode(error).exception();
> > > > > > > >> > > > >> //
> > > > > > > >> > > > >> > or
> > > > > > > >> > > > >> > > some
> > > > > > > >> > > > >> > > > > > > other
> > > > > > > >> > > > >> > > > > > > > >>> > default
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > behavior
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > This works because we
> > have a
> > > > > > > convention
> > > > > > > >> > > that
> > > > > > > >> > > > >> and
> > > > > > > >> > > > >> > > error
> > > > > > > >> > > > >> > > > > is
> > > > > > > >> > > > >> > > > > > > > >>> something
> > > > > > > >> > > > >> > > > > > > > >>> > > that
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > prevented your getting
> > the
> > > > > > response
> > > > > > > so
> > > > > > > >> > the
> > > > > > > >> > > > >> default
> > > > > > > >> > > > >> > > > > > handling
> > > > > > > >> > > > >> > > > > > > > >>> case is
> > > > > > > >> > > > >> > > > > > > > >>> > > sane
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > and forward compatible.
> > It
> > > is
> > > > > > > tempting
> > > > > > > >> to
> > > > > > > >> > > use
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > error
> > > > > > > >> > > > >> > > > > > code
> > > > > > > >> > > > >> > > > > > > > to
> > > > > > > >> > > > >> > > > > > > > >>> > convey
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > information in the
> > success
> > > > case.
> > > > > > For
> > > > > > > >> > > example
> > > > > > > >> > > > we
> > > > > > > >> > > > >> > > could
> > > > > > > >> > > > >> > > > > use
> > > > > > > >> > > > >> > > > > > > > error
> > > > > > > >> > > > >> > > > > > > > >>> > codes
> > > > > > > >> > > > >> > > > > > > > >>> > > to
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > encode whether quotas
> > were
> > > > > > enforced,
> > > > > > > >> > > whether
> > > > > > > >> > > > >> the
> > > > > > > >> > > > >> > > request
> > > > > > > >> > > > >> > > > > > was
> > > > > > > >> > > > >> > > > > > > > >>> served
> > > > > > > >> > > > >> > > > > > > > >>> > > out
> > > > > > > >> > > > >> > > > > > > > >>> > > >> of
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > cache, whether the stock
> > > > market
> > > > > is
> > > > > > > up
> > > > > > > >> > > today,
> > > > > > > >> > > > or
> > > > > > > >> > > > >> > > > > whatever.
> > > > > > > >> > > > >> > > > > > > The
> > > > > > > >> > > > >> > > > > > > > >>> > problem
> > > > > > > >> > > > >> > > > > > > > >>> > > is
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > that since these are not
> > > > errors
> > > > > as
> > > > > > > far
> > > > > > > >> as
> > > > > > > >> > > the
> > > > > > > >> > > > >> > > client is
> > > > > > > >> > > > >> > > > > > > > >>> concerned it
> > > > > > > >> > > > >> > > > > > > > >>> > > >> should
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > not throw an exception
> > but
> > > > > process
> > > > > > > the
> > > > > > > >> > > > >> response,
> > > > > > > >> > > > >> > > but now
> > > > > > > >> > > > >> > > > > > we
> > > > > > > >> > > > >> > > > > > > > >>> created
> > > > > > > >> > > > >> > > > > > > > >>> > an
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > explicit requirement that
> > > that
> > > > > > > error be
> > > > > > > >> > > > handled
> > > > > > > >> > > > >> > > > > explicitly
> > > > > > > >> > > > >> > > > > > > > >>> since it
> > > > > > > >> > > > >> > > > > > > > >>> > is
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > different. I really think
> > > that
> > > > > > this
> > > > > > > >> kind
> > > > > > > >> > of
> > > > > > > >> > > > >> > > information
> > > > > > > >> > > > >> > > > > is
> > > > > > > >> > > > >> > > > > > > not
> > > > > > > >> > > > >> > > > > > > > >>> an
> > > > > > > >> > > > >> > > > > > > > >>> > > error,
> > > > > > > >> > > > >> > > > > > > > >>> > > >> it
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > is just information, and
> > if
> > > we
> > > > > > want
> > > > > > > it
> > > > > > > >> in
> > > > > > > >> > > the
> > > > > > > >> > > > >> > > response
> > > > > > > >> > > > >> > > > > we
> > > > > > > >> > > > >> > > > > > > > >>> should do
> > > > > > > >> > > > >> > > > > > > > >>> > > the
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > right thing and add a new
> > > > field
> > > > > to
> > > > > > > the
> > > > > > > >> > > > >> response.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > I think you saw the Samza
> > > bug
> > > > > that
> > > > > > > was
> > > > > > > >> > > > >> literally
> > > > > > > >> > > > >> > an
> > > > > > > >> > > > >> > > > > > example
> > > > > > > >> > > > >> > > > > > > of
> > > > > > > >> > > > >> > > > > > > > >>> this
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > happening and leading to
> > an
> > > > > > infinite
> > > > > > > >> > retry
> > > > > > > >> > > > >> loop.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > Further more I really
> > want
> > > to
> > > > > > > emphasize
> > > > > > > >> > > that
> > > > > > > >> > > > >> > hitting
> > > > > > > >> > > > >> > > > > your
> > > > > > > >> > > > >> > > > > > > > quota
> > > > > > > >> > > > >> > > > > > > > >>> in
> > > > > > > >> > > > >> > > > > > > > >>> > the
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > design that Adi has
> > proposed
> > > > is
> > > > > > > >> actually
> > > > > > > >> > > not
> > > > > > > >> > > > an
> > > > > > > >> > > > >> > > error
> > > > > > > >> > > > >> > > > > > > > condition
> > > > > > > >> > > > >> > > > > > > > >>> at
> > > > > > > >> > > > >> > > > > > > > >>> > > all.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> It
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > is totally reasonable in
> > any
> > > > > > > bootstrap
> > > > > > > >> > > > >> situation
> > > > > > > >> > > > >> > to
> > > > > > > >> > > > >> > > > > > > > >>> intentionally
> > > > > > > >> > > > >> > > > > > > > >>> > > want to
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > run at the limit the
> > system
> > > > > > imposes
> > > > > > > on
> > > > > > > >> > you.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > -Jay
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> > On Mon, Mar 16, 2015 at
> > 4:27
> > > > PM,
> > > > > > Jun
> > > > > > > >> Rao
> > > > > > > >> > <
> > > > > > > >> > > > >> > > > > > j...@confluent.io>
> > > > > > > >> > > > >> > > > > > > > >>> wrote:
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> It's probably useful for
> > a
> > > > > client
> > > > > > > to
> > > > > > > >> > know
> > > > > > > >> > > > >> whether
> > > > > > > >> > > > >> > > its
> > > > > > > >> > > > >> > > > > > > > requests
> > > > > > > >> > > > >> > > > > > > > >>> are
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> throttled or not (e.g.,
> > for
> > > > > > > monitoring
> > > > > > > >> > and
> > > > > > > >> > > > >> > > alerting).
> > > > > > > >> > > > >> > > > > > From
> > > > > > > >> > > > >> > > > > > > > that
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> perspective, option B
> > > (delay
> > > > > the
> > > > > > > >> > requests
> > > > > > > >> > > > and
> > > > > > > >> > > > >> > > return an
> > > > > > > >> > > > >> > > > > > > > error)
> > > > > > > >> > > > >> > > > > > > > >>> > seems
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> better.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> Thanks,
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> Jun
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> On Wed, Mar 4, 2015 at
> > 3:51
> > > > PM,
> > > > > > > Aditya
> > > > > > > >> > > > >> Auradkar <
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > aaurad...@linkedin.com.invalid
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Posted a KIP for
> > quotas
> > > in
> > > > > > kafka.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >>
> > > > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Appreciate any
> > feedback.
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> > Aditya
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >> >
> > > > > > > >> > > > >> > > > > > > > >>> > > >> >>
> > > > > > > >> > > > >> > > > > > > > >>> > > >>
> > > > > > > >> > > > >> > > > > > > > >>> > >
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>> > --
> > > > > > > >> > > > >> > > > > > > > >>> > Thanks,
> > > > > > > >> > > > >> > > > > > > > >>> > Ewen
> > > > > > > >> > > > >> > > > > > > > >>> >
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>> --
> > > > > > > >> > > > >> > > > > > > > >>> -- Guozhang
> > > > > > > >> > > > >> > > > > > > > >>>
> > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > >> > > > >> > > > > > > > >>
> > > > > > > >> > > > >> > > > > > > > >
> > > > > > > >> > > > >> > > > > > > >
> > > > > > > >> > > > >> > > > > > >
> > > > > > > >> > > > >> > > > > >
> > > > > > > >> > > > >> > > > >
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> > >
> > > > > > > >> > > > >> >
> > > > > > > >> > > > >>
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > --
> > > > > > > >> > > > > Sent from Gmail Mobile
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > > Sent from Gmail Mobile
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >

Reply via email to