Re: [DISCUSS] KIP-124: Request rate quotas

radai Thu, 23 Feb 2017 12:05:41 -0800

@jun: i wasnt concerned about tying up a request processing thread, but
IIUC the code does still read the entire request out, which might add-up to
a non-negligible amount of memory.


On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Rajini,
>
> The current KIP says that the maximum delay will be reduced to window size
> if it is larger than the window size. I have a concern with this:
>
> 1) This essentially means that the user is allowed to exceed their quota
> over a long period of time. Can you provide an upper bound on this
> deviation?
>
> 2) What is the motivation for cap the maximum delay by the window size? I
> am wondering if there is better alternative to address the problem.
>
> 3) It means that the existing metric-related config will have a more
> directly impact on the mechanism of this io-thread-unit-based quota. The
> may be an important change depending on the answer to 1) above. We probably
> need to document this more explicitly.
>
> Dong
>
>
> On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hey Jun,
> >
> > Yeah you are right. I thought it wasn't because at LinkedIn it will be
> too
> > much pressure on inGraph to expose those per-clientId metrics so we ended
> > up printing them periodically to local log. Never mind if it is not a
> > general problem.
> >
> > Hey Rajini,
> >
> > - I agree with Jay that we probably don't want to add a new field for
> > every quota ProduceResponse or FetchResponse. Is there any use-case for
> > having separate throttle-time fields for byte-rate-quota and
> > io-thread-unit-quota? You probably need to document this as interface
> > change if you plan to add new field in any request.
> >
> > - I don't think IOThread belongs to quotaType. The existing quota types
> > (i.e. Produce/Fetch/LeaderReplication/FollowerReplication) identify the
> > type of request that are throttled, not the quota mechanism that is
> applied.
> >
> > - If a request is throttled due to this io-thread-unit-based quota, is
> the
> > existing queue-size metric in ClientQuotaManager incremented?
> >
> > - In the interest of providing guide line for admin to decide
> > io-thread-unit-based quota and for user to understand its impact on their
> > traffic, would it be useful to have a metric that shows the overall
> > byte-rate per io-thread-unit? Can we also show this a per-clientId
> metric?
> >
> > Thanks,
> > Dong
> >
> >
> > On Thu, Feb 23, 2017 at 9:25 AM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Ismael,
> >>
> >> For #3, typically, an admin won't configure more io threads than CPU
> >> cores,
> >> but it's possible for an admin to start with fewer io threads than cores
> >> and grow that later on.
> >>
> >> Hi, Dong,
> >>
> >> I think the throttleTime sensor on the broker tells the admin whether a
> >> user/clentId is throttled or not.
> >>
> >> Hi, Radi,
> >>
> >> The reasoning for delaying the throttled requests on the broker instead
> of
> >> returning an error immediately is that the latter has no way to prevent
> >> the
> >> client from retrying immediately, which will make things worse. The
> >> delaying logic is based off a delay queue. A separate expiration thread
> >> just waits on the next to be expired request. So, it doesn't tie up a
> >> request handler thread.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >>
> >> > Hi Jay,
> >> >
> >> > Regarding 1, I definitely like the simplicity of keeping a single
> >> throttle
> >> > time field in the response. The downside is that the client metrics
> >> will be
> >> > more coarse grained.
> >> >
> >> > Regarding 3, we have `leader.imbalance.per.broker.percentage` and
> >> > `log.cleaner.min.cleanable.ratio`.
> >> >
> >> > Ismael
> >> >
> >> > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps <j...@confluent.io> wrote:
> >> >
> >> > > A few minor comments:
> >> > >
> >> > >    1. Isn't it the case that the throttling time response field
> should
> >> > have
> >> > >    the total time your request was throttled irrespective of the
> >> quotas
> >> > > that
> >> > >    caused that. Limiting it to byte rate quota doesn't make sense,
> >> but I
> >> > > also
> >> > >    I don't think we want to end up adding new fields in the response
> >> for
> >> > > every
> >> > >    single thing we quota, right?
> >> > >    2. I don't think we should make this quota specifically about io
> >> > >    threads. Once we introduce these quotas people set them and
> expect
> >> > them
> >> > > to
> >> > >    be enforced (and if they aren't it may cause an outage). As a
> >> result
> >> > > they
> >> > >    are a bit more sensitive than normal configs, I think. The
> current
> >> > > thread
> >> > >    pools seem like something of an implementation detail and not the
> >> > level
> >> > > the
> >> > >    user-facing quotas should be involved with. I think it might be
> >> better
> >> > > to
> >> > >    make this a general request-time throttle with no mention in the
> >> > naming
> >> > >    about I/O threads and simply acknowledge the current limitation
> >> (which
> >> > > we
> >> > >    may someday fix) in the docs that this covers only the time after
> >> the
> >> > >    thread is read off the network.
> >> > >    3. As such I think the right interface to the user would be
> >> something
> >> > >    like percent_request_time and be in {0,...100} or
> >> request_time_ratio
> >> > > and be
> >> > >    in {0.0,...,1.0} (I think "ratio" is the terminology we used if
> the
> >> > > scale
> >> > >    is between 0 and 1 in the other metrics, right?)
> >> > >
> >> > > -Jay
> >> > >
> >> > > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram <
> >> rajinisiva...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Guozhang/Dong,
> >> > > >
> >> > > > Thank you for the feedback.
> >> > > >
> >> > > > Guozhang : I have updated the section on co-existence of byte rate
> >> and
> >> > > > request time quotas.
> >> > > >
> >> > > > Dong: I hadn't added much detail to the metrics and sensors since
> >> they
> >> > > are
> >> > > > going to be very similar to the existing metrics and sensors. To
> >> avoid
> >> > > > confusion, I have now added more detail. All metrics are in the
> >> group
> >> > > > "quotaType" and all sensors have names starting with "quotaType"
> >> (where
> >> > > > quotaType is Produce/Fetch/LeaderReplication/
> >> > > > FollowerReplication/*IOThread*).
> >> > > > So there will be no reuse of existing metrics/sensors. The new
> ones
> >> for
> >> > > > request processing time based throttling will be completely
> >> independent
> >> > > of
> >> > > > existing metrics/sensors, but will be consistent in format.
> >> > > >
> >> > > > The existing throttle_time_ms field in produce/fetch responses
> will
> >> not
> >> > > be
> >> > > > impacted by this KIP. That will continue to return byte-rate based
> >> > > > throttling times. In addition, a new field
> request_throttle_time_ms
> >> > will
> >> > > be
> >> > > > added to return request quota based throttling times. These will
> be
> >> > > exposed
> >> > > > as new metrics on the client-side.
> >> > > >
> >> > > > Since all metrics and sensors are different for each type of
> quota,
> >> I
> >> > > > believe there is already sufficient metrics to monitor throttling
> on
> >> > both
> >> > > > client and broker side for each type of throttling.
> >> > > >
> >> > > > Regards,
> >> > > >
> >> > > > Rajini
> >> > > >
> >> > > >
> >> > > > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin <lindon...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hey Rajini,
> >> > > > >
> >> > > > > I think it makes a lot of sense to use io_thread_units as metric
> >> to
> >> > > quota
> >> > > > > user's traffic here. LGTM overall. I have some questions
> regarding
> >> > > > sensors.
> >> > > > >
> >> > > > > - Can you be more specific in the KIP what sensors will be
> added?
> >> For
> >> > > > > example, it will be useful to specify the name and attributes of
> >> > these
> >> > > > new
> >> > > > > sensors.
> >> > > > >
> >> > > > > - We currently have throttle-time and queue-size for byte-rate
> >> based
> >> > > > quota.
> >> > > > > Are you going to have separate throttle-time and queue-size for
> >> > > requests
> >> > > > > throttled by io_thread_unit-based quota, or will they share the
> >> same
> >> > > > > sensor?
> >> > > > >
> >> > > > > - Does the throttle-time in the ProduceResponse and
> FetchResponse
> >> > > > contains
> >> > > > > time due to io_thread_unit-based quota?
> >> > > > >
> >> > > > > - Currently kafka server doesn't not provide any log or metrics
> >> that
> >> > > > tells
> >> > > > > whether any given clientId (or user) is throttled. This is not
> too
> >> > bad
> >> > > > > because we can still check the client-side byte-rate metric to
> >> > validate
> >> > > > > whether a given client is throttled. But with this
> io_thread_unit,
> >> > > there
> >> > > > > will be no way to validate whether a given client is slow
> because
> >> it
> >> > > has
> >> > > > > exceeded its io_thread_unit limit. It is necessary for user to
> be
> >> > able
> >> > > to
> >> > > > > know this information to figure how whether they have reached
> >> there
> >> > > quota
> >> > > > > limit. How about we add log4j log on the server side to
> >> periodically
> >> > > > print
> >> > > > > the (client_id, byte-rate-throttle-time,
> >> > io-thread-unit-throttle-time)
> >> > > so
> >> > > > > that kafka administrator can figure those users that have
> reached
> >> > their
> >> > > > > limit and act accordingly?
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Dong
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang <
> >> wangg...@gmail.com>
> >> > > > wrote:
> >> > > > >
> >> > > > > > Made a pass over the doc, overall LGTM except a minor comment
> on
> >> > the
> >> > > > > > throttling implementation:
> >> > > > > >
> >> > > > > > Stated as "Request processing time throttling will be applied
> on
> >> > top
> >> > > if
> >> > > > > > necessary." I thought that it meant the request processing
> time
> >> > > > > throttling
> >> > > > > > is applied first, but continue reading I found it actually
> >> meant to
> >> > > > apply
> >> > > > > > produce / fetch byte rate throttling first.
> >> > > > > >
> >> > > > > > Also the last sentence "The remaining delay if any is applied
> to
> >> > the
> >> > > > > > response." is a bit confusing to me. Maybe rewording it a bit?
> >> > > > > >
> >> > > > > >
> >> > > > > > Guozhang
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao <j...@confluent.io>
> >> wrote:
> >> > > > > >
> >> > > > > > > Hi, Rajini,
> >> > > > > > >
> >> > > > > > > Thanks for the updated KIP. The latest proposal looks good
> to
> >> me.
> >> > > > > > >
> >> > > > > > > Jun
> >> > > > > > >
> >> > > > > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram <
> >> > > > > rajinisiva...@gmail.com
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Jun/Roger,
> >> > > > > > > >
> >> > > > > > > > Thank you for the feedback.
> >> > > > > > > >
> >> > > > > > > > 1. I have updated the KIP to use absolute units instead of
> >> > > > > percentage.
> >> > > > > > > The
> >> > > > > > > > property is called* io_thread_units* to align with the
> >> thread
> >> > > count
> >> > > > > > > > property *num.io.threads*. When we implement network
> thread
> >> > > > > utilization
> >> > > > > > > > quotas, we can add another property
> *network_thread_units.*
> >> > > > > > > >
> >> > > > > > > > 2. ControlledShutdown is already listed under the exempt
> >> > > requests.
> >> > > > > Jun,
> >> > > > > > > did
> >> > > > > > > > you mean a different request that needs to be added? The
> >> four
> >> > > > > requests
> >> > > > > > > > currently exempt in the KIP are StopReplica,
> >> > ControlledShutdown,
> >> > > > > > > > LeaderAndIsr and UpdateMetadata. These are controlled
> using
> >> > > > > > ClusterAction
> >> > > > > > > > ACL, so it is easy to exclude and only throttle if
> >> > unauthorized.
> >> > > I
> >> > > > > > wasn't
> >> > > > > > > > sure if there are other requests used only for
> inter-broker
> >> > that
> >> > > > > needed
> >> > > > > > > to
> >> > > > > > > > be excluded.
> >> > > > > > > >
> >> > > > > > > > 3. I was thinking the smallest change would be to replace
> >> all
> >> > > > > > references
> >> > > > > > > to
> >> > > > > > > > *requestChannel.sendResponse()* with a local method
> >> > > > > > > > *sendResponseMaybeThrottle()* that does the throttling if
> >> any
> >> > > plus
> >> > > > > send
> >> > > > > > > > response. If we throttle first in *KafkaApis.handle()*,
> the
> >> > time
> >> > > > > spent
> >> > > > > > > > within the method handling the request will not be
> recorded
> >> or
> >> > > used
> >> > > > > in
> >> > > > > > > > throttling. We can look into this again when the PR is
> ready
> >> > for
> >> > > > > > review.
> >> > > > > > > >
> >> > > > > > > > Regards,
> >> > > > > > > >
> >> > > > > > > > Rajini
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover <
> >> > > > > roger.hoo...@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Great to see this KIP and the excellent discussion.
> >> > > > > > > > >
> >> > > > > > > > > To me, Jun's suggestion makes sense.  If my application
> is
> >> > > > > allocated
> >> > > > > > 1
> >> > > > > > > > > request handler unit, then it's as if I have a Kafka
> >> broker
> >> > > with
> >> > > > a
> >> > > > > > > single
> >> > > > > > > > > request handler thread dedicated to me.  That's the
> most I
> >> > can
> >> > > > use,
> >> > > > > > at
> >> > > > > > > > > least.  That allocation doesn't change even if an admin
> >> later
> >> > > > > > increases
> >> > > > > > > > the
> >> > > > > > > > > size of the request thread pool on the broker.  It's
> >> similar
> >> > to
> >> > > > the
> >> > > > > > CPU
> >> > > > > > > > > abstraction that VMs and containers get from hypervisors
> >> or
> >> > OS
> >> > > > > > > > schedulers.
> >> > > > > > > > > While different client access patterns can use wildly
> >> > different
> >> > > > > > amounts
> >> > > > > > > > of
> >> > > > > > > > > request thread resources per request, a given
> application
> >> > will
> >> > > > > > > generally
> >> > > > > > > > > have a stable access pattern and can figure out
> >> empirically
> >> > how
> >> > > > > many
> >> > > > > > > > > "request thread units" it needs to meet it's
> >> > throughput/latency
> >> > > > > > goals.
> >> > > > > > > > >
> >> > > > > > > > > Cheers,
> >> > > > > > > > >
> >> > > > > > > > > Roger
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao <
> >> j...@confluent.io>
> >> > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi, Rajini,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for the updated KIP. A few more comments.
> >> > > > > > > > > >
> >> > > > > > > > > > 1. A concern of request_time_percent is that it's not
> an
> >> > > > absolute
> >> > > > > > > > value.
> >> > > > > > > > > > Let's say you give a user a 10% limit. If the admin
> >> doubles
> >> > > the
> >> > > > > > > number
> >> > > > > > > > of
> >> > > > > > > > > > request handler threads, that user now actually has
> >> twice
> >> > the
> >> > > > > > > absolute
> >> > > > > > > > > > capacity. This may confuse people a bit. So, perhaps
> >> > setting
> >> > > > the
> >> > > > > > > quota
> >> > > > > > > > > > based on an absolute request thread unit is better.
> >> > > > > > > > > >
> >> > > > > > > > > > 2. ControlledShutdownRequest is also an inter-broker
> >> > request
> >> > > > and
> >> > > > > > > needs
> >> > > > > > > > to
> >> > > > > > > > > > be excluded from throttling.
> >> > > > > > > > > >
> >> > > > > > > > > > 3. Implementation wise, I am wondering if it's simpler
> >> to
> >> > > apply
> >> > > > > the
> >> > > > > > > > > request
> >> > > > > > > > > > time throttling first in KafkaApis.handle().
> Otherwise,
> >> we
> >> > > will
> >> > > > > > need
> >> > > > > > > to
> >> > > > > > > > > add
> >> > > > > > > > > > the throttling logic in each type of request.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks,
> >> > > > > > > > > >
> >> > > > > > > > > > Jun
> >> > > > > > > > > >
> >> > > > > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini Sivaram <
> >> > > > > > > > rajinisiva...@gmail.com
> >> > > > > > > > > >
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Jun,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Thank you for the review.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I have reverted to the original KIP that throttles
> >> based
> >> > on
> >> > > > > > request
> >> > > > > > > > > > handler
> >> > > > > > > > > > > utilization. At the moment, it uses percentage, but
> I
> >> am
> >> > > > happy
> >> > > > > to
> >> > > > > > > > > change
> >> > > > > > > > > > to
> >> > > > > > > > > > > a fraction (out of 1 instead of 100) if required. I
> >> have
> >> > > > added
> >> > > > > > the
> >> > > > > > > > > > examples
> >> > > > > > > > > > > from this discussion to the KIP. Also added a
> "Future
> >> > Work"
> >> > > > > > section
> >> > > > > > > > to
> >> > > > > > > > > > > address network thread utilization. The
> configuration
> >> is
> >> > > > named
> >> > > > > > > > > > > "request_time_percent" with the expectation that it
> >> can
> >> > > also
> >> > > > be
> >> > > > > > > used
> >> > > > > > > > as
> >> > > > > > > > > > the
> >> > > > > > > > > > > limit for network thread utilization when that is
> >> > > > implemented,
> >> > > > > so
> >> > > > > > > > that
> >> > > > > > > > > > > users have to set only one config for the two and
> not
> >> > have
> >> > > to
> >> > > > > > worry
> >> > > > > > > > > about
> >> > > > > > > > > > > the internal distribution of the work between the
> two
> >> > > thread
> >> > > > > > pools
> >> > > > > > > in
> >> > > > > > > > > > > Kafka.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Regards,
> >> > > > > > > > > > >
> >> > > > > > > > > > > Rajini
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao <
> >> > > j...@confluent.io>
> >> > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Hi, Rajini,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks for the proposal.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The benefit of using the request processing time
> >> over
> >> > the
> >> > > > > > request
> >> > > > > > > > > rate
> >> > > > > > > > > > is
> >> > > > > > > > > > > > exactly what people have said. I will just expand
> >> that
> >> > a
> >> > > > bit.
> >> > > > > > > > > Consider
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > following case. The producer sends a produce
> request
> >> > > with a
> >> > > > > > 10MB
> >> > > > > > > > > > message
> >> > > > > > > > > > > > but compressed to 100KB with gzip. The
> >> decompression of
> >> > > the
> >> > > > > > > message
> >> > > > > > > > > on
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > broker could take 10-15 seconds, during which
> time,
> >> a
> >> > > > request
> >> > > > > > > > handler
> >> > > > > > > > > > > > thread is completely blocked. In this case,
> neither
> >> the
> >> > > > > byte-in
> >> > > > > > > > quota
> >> > > > > > > > > > nor
> >> > > > > > > > > > > > the request rate quota may be effective in
> >> protecting
> >> > the
> >> > > > > > broker.
> >> > > > > > > > > > > Consider
> >> > > > > > > > > > > > another case. A consumer group starts with 10
> >> instances
> >> > > and
> >> > > > > > later
> >> > > > > > > > on
> >> > > > > > > > > > > > switches to 20 instances. The request rate will
> >> likely
> >> > > > > double,
> >> > > > > > > but
> >> > > > > > > > > the
> >> > > > > > > > > > > > actually load on the broker may not double since
> >> each
> >> > > fetch
> >> > > > > > > request
> >> > > > > > > > > > only
> >> > > > > > > > > > > > contains half of the partitions. Request rate
> quota
> >> may
> >> > > not
> >> > > > > be
> >> > > > > > > easy
> >> > > > > > > > > to
> >> > > > > > > > > > > > configure in this case.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > What we really want is to be able to prevent a
> >> client
> >> > > from
> >> > > > > > using
> >> > > > > > > > too
> >> > > > > > > > > > much
> >> > > > > > > > > > > > of the server side resources. In this particular
> >> KIP,
> >> > > this
> >> > > > > > > resource
> >> > > > > > > > > is
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > capacity of the request handler threads. I agree
> >> that
> >> > it
> >> > > > may
> >> > > > > > not
> >> > > > > > > be
> >> > > > > > > > > > > > intuitive for the users to determine how to set
> the
> >> > right
> >> > > > > > limit.
> >> > > > > > > > > > However,
> >> > > > > > > > > > > > this is not completely new and has been done in
> the
> >> > > > container
> >> > > > > > > world
> >> > > > > > > > > > > > already. For example, Linux cgroup (
> >> > > > > https://access.redhat.com/
> >> > > > > > > > > > > > documentation/en-US/Red_Hat_En
> >> terprise_Linux/6/html/
> >> > > > > > > > > > > > Resource_Management_Guide/sec-cpu.html) has the
> >> > concept
> >> > > of
> >> > > > > > > > > > > > cpu.cfs_quota_us,
> >> > > > > > > > > > > > which specifies the total amount of time in
> >> > microseconds
> >> > > > for
> >> > > > > > > which
> >> > > > > > > > > all
> >> > > > > > > > > > > > tasks in a cgroup can run during a one second
> >> period.
> >> > We
> >> > > > can
> >> > > > > > > > > > potentially
> >> > > > > > > > > > > > model the request handler threads in a similar
> way.
> >> For
> >> > > > > > example,
> >> > > > > > > > each
> >> > > > > > > > > > > > request handler thread can be 1 request handler
> unit
> >> > and
> >> > > > the
> >> > > > > > > admin
> >> > > > > > > > > can
> >> > > > > > > > > > > > configure a limit on how many units (say 0.01) a
> >> client
> >> > > can
> >> > > > > > have.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Regarding not throttling the internal broker to
> >> broker
> >> > > > > > requests.
> >> > > > > > > We
> >> > > > > > > > > > could
> >> > > > > > > > > > > > do that. Alternatively, we could just let the
> admin
> >> > > > > configure a
> >> > > > > > > > high
> >> > > > > > > > > > > limit
> >> > > > > > > > > > > > for the kafka user (it may not be able to do that
> >> > easily
> >> > > > > based
> >> > > > > > on
> >> > > > > > > > > > > clientId
> >> > > > > > > > > > > > though).
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Ideally we want to be able to protect the
> >> utilization
> >> > of
> >> > > > the
> >> > > > > > > > network
> >> > > > > > > > > > > thread
> >> > > > > > > > > > > > pool too. The difficult is mostly what Rajini
> said:
> >> (1)
> >> > > The
> >> > > > > > > > mechanism
> >> > > > > > > > > > for
> >> > > > > > > > > > > > throttling the requests is through Purgatory and
> we
> >> > will
> >> > > > have
> >> > > > > > to
> >> > > > > > > > > think
> >> > > > > > > > > > > > through how to integrate that into the network
> >> layer.
> >> > > (2)
> >> > > > In
> >> > > > > > the
> >> > > > > > > > > > network
> >> > > > > > > > > > > > layer, currently we know the user, but not the
> >> clientId
> >> > > of
> >> > > > > the
> >> > > > > > > > > request.
> >> > > > > > > > > > > So,
> >> > > > > > > > > > > > it's a bit tricky to throttle based on clientId
> >> there.
> >> > > > Plus,
> >> > > > > > the
> >> > > > > > > > > > byteOut
> >> > > > > > > > > > > > quota can already protect the network thread
> >> > utilization
> >> > > > for
> >> > > > > > > fetch
> >> > > > > > > > > > > > requests. So, if we can't figure out this part
> right
> >> > now,
> >> > > > > just
> >> > > > > > > > > focusing
> >> > > > > > > > > > > on
> >> > > > > > > > > > > > the request handling threads for this KIP is
> still a
> >> > > useful
> >> > > > > > > > feature.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Jun
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini Sivaram <
> >> > > > > > > > > > rajinisiva...@gmail.com
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > Thank you all for the feedback.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Jay: I have removed exemption for consumer
> >> heartbeat
> >> > > etc.
> >> > > > > > Agree
> >> > > > > > > > > that
> >> > > > > > > > > > > > > protecting the cluster is more important than
> >> > > protecting
> >> > > > > > > > individual
> >> > > > > > > > > > > apps.
> >> > > > > > > > > > > > > Have retained the exemption for
> >> > > StopReplicat/LeaderAndIsr
> >> > > > > > etc,
> >> > > > > > > > > these
> >> > > > > > > > > > > are
> >> > > > > > > > > > > > > throttled only if authorization fails (so can't
> be
> >> > used
> >> > > > for
> >> > > > > > DoS
> >> > > > > > > > > > attacks
> >> > > > > > > > > > > > in
> >> > > > > > > > > > > > > a secure cluster, but allows inter-broker
> >> requests to
> >> > > > > > complete
> >> > > > > > > > > > without
> >> > > > > > > > > > > > > delays).
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > I will wait another day to see if these is any
> >> > > objection
> >> > > > to
> >> > > > > > > > quotas
> >> > > > > > > > > > > based
> >> > > > > > > > > > > > on
> >> > > > > > > > > > > > > request processing time (as opposed to request
> >> rate)
> >> > > and
> >> > > > if
> >> > > > > > > there
> >> > > > > > > > > are
> >> > > > > > > > > > > no
> >> > > > > > > > > > > > > objections, I will revert to the original
> proposal
> >> > with
> >> > > > > some
> >> > > > > > > > > changes.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > The original proposal was only including the
> time
> >> > used
> >> > > by
> >> > > > > the
> >> > > > > > > > > request
> >> > > > > > > > > > > > > handler threads (that made calculation easy). I
> >> think
> >> > > the
> >> > > > > > > > > suggestion
> >> > > > > > > > > > is
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > include the time spent in the network threads as
> >> well
> >> > > > since
> >> > > > > > > that
> >> > > > > > > > > may
> >> > > > > > > > > > be
> >> > > > > > > > > > > > > significant. As Jay pointed out, it is more
> >> > complicated
> >> > > > to
> >> > > > > > > > > calculate
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > total available CPU time and convert to a ratio
> >> when
> >> > > > there
> >> > > > > > *m*
> >> > > > > > > > I/O
> >> > > > > > > > > > > > threads
> >> > > > > > > > > > > > > and *n* network threads.
> >> > ThreadMXBean#getThreadCPUTime(
> >> > > )
> >> > > > > may
> >> > > > > > > > give
> >> > > > > > > > > us
> >> > > > > > > > > > > > what
> >> > > > > > > > > > > > > we want, but it can be very expensive on some
> >> > > platforms.
> >> > > > As
> >> > > > > > > > Becket
> >> > > > > > > > > > and
> >> > > > > > > > > > > > > Guozhang have pointed out, we do have several
> time
> >> > > > > > measurements
> >> > > > > > > > > > already
> >> > > > > > > > > > > > for
> >> > > > > > > > > > > > > generating metrics that we could use, though we
> >> might
> >> > > > want
> >> > > > > to
> >> > > > > > > > > switch
> >> > > > > > > > > > to
> >> > > > > > > > > > > > > nanoTime() instead of currentTimeMillis() since
> >> some
> >> > of
> >> > > > the
> >> > > > > > > > values
> >> > > > > > > > > > for
> >> > > > > > > > > > > > > small requests may be < 1ms. But rather than add
> >> up
> >> > the
> >> > > > > time
> >> > > > > > > > spent
> >> > > > > > > > > in
> >> > > > > > > > > > > I/O
> >> > > > > > > > > > > > > thread and network thread, wouldn't it be better
> >> to
> >> > > > convert
> >> > > > > > the
> >> > > > > > > > > time
> >> > > > > > > > > > > > spent
> >> > > > > > > > > > > > > on each thread into a separate ratio? UserA has
> a
> >> > > request
> >> > > > > > quota
> >> > > > > > > > of
> >> > > > > > > > > > 5%.
> >> > > > > > > > > > > > Can
> >> > > > > > > > > > > > > we take that to mean that UserA can use 5% of
> the
> >> > time
> >> > > on
> >> > > > > > > network
> >> > > > > > > > > > > threads
> >> > > > > > > > > > > > > and 5% of the time on I/O threads? If either is
> >> > > exceeded,
> >> > > > > the
> >> > > > > > > > > > response
> >> > > > > > > > > > > is
> >> > > > > > > > > > > > > throttled - it would mean maintaining two sets
> of
> >> > > metrics
> >> > > > > for
> >> > > > > > > the
> >> > > > > > > > > two
> >> > > > > > > > > > > > > durations, but would result in more meaningful
> >> > ratios.
> >> > > We
> >> > > > > > could
> >> > > > > > > > > > define
> >> > > > > > > > > > > > two
> >> > > > > > > > > > > > > quota limits (UserA has 5% of request threads
> and
> >> 10%
> >> > > of
> >> > > > > > > network
> >> > > > > > > > > > > > threads),
> >> > > > > > > > > > > > > but that seems unnecessary and harder to explain
> >> to
> >> > > > users.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Back to why and how quotas are applied to
> network
> >> > > thread
> >> > > > > > > > > utilization:
> >> > > > > > > > > > > > > a) In the case of fetch,  the time spent in the
> >> > network
> >> > > > > > thread
> >> > > > > > > > may
> >> > > > > > > > > be
> >> > > > > > > > > > > > > significant and I can see the need to include
> >> this.
> >> > Are
> >> > > > > there
> >> > > > > > > > other
> >> > > > > > > > > > > > > requests where the network thread utilization is
> >> > > > > significant?
> >> > > > > > > In
> >> > > > > > > > > the
> >> > > > > > > > > > > case
> >> > > > > > > > > > > > > of fetch, request handler thread utilization
> would
> >> > > > throttle
> >> > > > > > > > clients
> >> > > > > > > > > > > with
> >> > > > > > > > > > > > > high request rate, low data volume and fetch
> byte
> >> > rate
> >> > > > > quota
> >> > > > > > > will
> >> > > > > > > > > > > > throttle
> >> > > > > > > > > > > > > clients with high data volume. Network thread
> >> > > utilization
> >> > > > > is
> >> > > > > > > > > perhaps
> >> > > > > > > > > > > > > proportional to the data volume. I am wondering
> >> if we
> >> > > > even
> >> > > > > > need
> >> > > > > > > > to
> >> > > > > > > > > > > > throttle
> >> > > > > > > > > > > > > based on network thread utilization or whether
> the
> >> > data
> >> > > > > > volume
> >> > > > > > > > > quota
> >> > > > > > > > > > > > covers
> >> > > > > > > > > > > > > this case.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > b) At the moment, we record and check for quota
> >> > > violation
> >> > > > > at
> >> > > > > > > the
> >> > > > > > > > > same
> >> > > > > > > > > > > > time.
> >> > > > > > > > > > > > > If a quota is violated, the response is delayed.
> >> > Using
> >> > > > > Jay'e
> >> > > > > > > > > example
> >> > > > > > > > > > of
> >> > > > > > > > > > > > > disk reads for fetches happening in the network
> >> > thread,
> >> > > > We
> >> > > > > > > can't
> >> > > > > > > > > > record
> >> > > > > > > > > > > > and
> >> > > > > > > > > > > > > delay a response after the disk reads. We could
> >> > record
> >> > > > the
> >> > > > > > time
> >> > > > > > > > > spent
> >> > > > > > > > > > > on
> >> > > > > > > > > > > > > the network thread when the response is complete
> >> and
> >> > > > > > introduce
> >> > > > > > > a
> >> > > > > > > > > > delay
> >> > > > > > > > > > > > for
> >> > > > > > > > > > > > > handling a subsequent request (separate out
> >> recording
> >> > > and
> >> > > > > > quota
> >> > > > > > > > > > > violation
> >> > > > > > > > > > > > > handling in the case of network thread
> overload).
> >> > Does
> >> > > > that
> >> > > > > > > make
> >> > > > > > > > > > sense?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Regards,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Rajini
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin <
> >> > > > > > > > becket....@gmail.com>
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hey Jay,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Yeah, I agree that enforcing the CPU time is a
> >> > little
> >> > > > > > > tricky. I
> >> > > > > > > > > am
> >> > > > > > > > > > > > > thinking
> >> > > > > > > > > > > > > > that maybe we can use the existing request
> >> > > statistics.
> >> > > > > They
> >> > > > > > > are
> >> > > > > > > > > > > already
> >> > > > > > > > > > > > > > very detailed so we can probably see the
> >> > approximate
> >> > > > CPU
> >> > > > > > time
> >> > > > > > > > > from
> >> > > > > > > > > > > it,
> >> > > > > > > > > > > > > e.g.
> >> > > > > > > > > > > > > > something like (total_time -
> >> > > > request/response_queue_time
> >> > > > > -
> >> > > > > > > > > > > > remote_time).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I agree with Guozhang that when a user is
> >> throttled
> >> > > it
> >> > > > is
> >> > > > > > > > likely
> >> > > > > > > > > > that
> >> > > > > > > > > > > > we
> >> > > > > > > > > > > > > > need to see if anything has went wrong first,
> >> and
> >> > if
> >> > > > the
> >> > > > > > > users
> >> > > > > > > > > are
> >> > > > > > > > > > > well
> >> > > > > > > > > > > > > > behaving and just need more resources, we will
> >> have
> >> > > to
> >> > > > > bump
> >> > > > > > > up
> >> > > > > > > > > the
> >> > > > > > > > > > > > quota
> >> > > > > > > > > > > > > > for them. It is true that pre-allocating CPU
> >> time
> >> > > quota
> >> > > > > > > > precisely
> >> > > > > > > > > > for
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > users is difficult. So in practice it would
> >> > probably
> >> > > be
> >> > > > > > more
> >> > > > > > > > like
> >> > > > > > > > > > > first
> >> > > > > > > > > > > > > set
> >> > > > > > > > > > > > > > a relative high protective CPU time quota for
> >> > > everyone
> >> > > > > and
> >> > > > > > > > > increase
> >> > > > > > > > > > > > that
> >> > > > > > > > > > > > > > for some individual clients on demand.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, Guozhang
> Wang <
> >> > > > > > > > > wangg...@gmail.com
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > This is a great proposal, glad to see it
> >> > happening.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I am inclined to the CPU throttling, or more
> >> > > > > specifically
> >> > > > > > > > > > > processing
> >> > > > > > > > > > > > > time
> >> > > > > > > > > > > > > > > ratio instead of the request rate throttling
> >> as
> >> > > well.
> >> > > > > > > Becket
> >> > > > > > > > > has
> >> > > > > > > > > > > very
> >> > > > > > > > > > > > > > well
> >> > > > > > > > > > > > > > > summed my rationales above, and one thing to
> >> add
> >> > > here
> >> > > > > is
> >> > > > > > > that
> >> > > > > > > > > the
> >> > > > > > > > > > > > > former
> >> > > > > > > > > > > > > > > has a good support for both "protecting
> >> against
> >> > > rogue
> >> > > > > > > > clients"
> >> > > > > > > > > as
> >> > > > > > > > > > > > well
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > > > "utilizing a cluster for multi-tenancy
> usage":
> >> > when
> >> > > > > > > thinking
> >> > > > > > > > > > about
> >> > > > > > > > > > > > how
> >> > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > explain this to the end users, I find it
> >> actually
> >> > > > more
> >> > > > > > > > natural
> >> > > > > > > > > > than
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > request rate since as mentioned above,
> >> different
> >> > > > > requests
> >> > > > > > > > will
> >> > > > > > > > > > have
> >> > > > > > > > > > > > > quite
> >> > > > > > > > > > > > > > > different "cost", and Kafka today already
> have
> >> > > > various
> >> > > > > > > > request
> >> > > > > > > > > > > types
> >> > > > > > > > > > > > > > > (produce, fetch, admin, metadata, etc),
> >> because
> >> > of
> >> > > > that
> >> > > > > > the
> >> > > > > > > > > > request
> >> > > > > > > > > > > > > rate
> >> > > > > > > > > > > > > > > throttling may not be as effective unless it
> >> is
> >> > set
> >> > > > > very
> >> > > > > > > > > > > > > conservatively.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Regarding to user reactions when they are
> >> > > throttled,
> >> > > > I
> >> > > > > > > think
> >> > > > > > > > it
> >> > > > > > > > > > may
> >> > > > > > > > > > > > > > differ
> >> > > > > > > > > > > > > > > case-by-case, and need to be discovered /
> >> guided
> >> > by
> >> > > > > > looking
> >> > > > > > > > at
> >> > > > > > > > > > > > relative
> >> > > > > > > > > > > > > > > metrics. So in other words users would not
> >> expect
> >> > > to
> >> > > > > get
> >> > > > > > > > > > additional
> >> > > > > > > > > > > > > > > information by simply being told "hey, you
> are
> >> > > > > > throttled",
> >> > > > > > > > > which
> >> > > > > > > > > > is
> >> > > > > > > > > > > > all
> >> > > > > > > > > > > > > > > what throttling does; they need to take a
> >> > follow-up
> >> > > > > step
> >> > > > > > > and
> >> > > > > > > > > see
> >> > > > > > > > > > > > "hmm,
> >> > > > > > > > > > > > > > I'm
> >> > > > > > > > > > > > > > > throttled probably because of ..", which is
> by
> >> > > > looking
> >> > > > > at
> >> > > > > > > > other
> >> > > > > > > > > > > > metric
> >> > > > > > > > > > > > > > > values: e.g. whether I'm bombarding the
> >> brokers
> >> > > with
> >> > > > > > > metadata
> >> > > > > > > > > > > > request,
> >> > > > > > > > > > > > > > > which are usually cheap to handle but I'm
> >> sending
> >> > > > > > thousands
> >> > > > > > > > per
> >> > > > > > > > > > > > second;
> >> > > > > > > > > > > > > > or
> >> > > > > > > > > > > > > > > is it because I'm catching up and hence
> >> sending
> >> > > very
> >> > > > > > heavy
> >> > > > > > > > > > fetching
> >> > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > with large min.bytes, etc.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Regarding to the implementation, as once
> >> > discussed
> >> > > > with
> >> > > > > > > Jun,
> >> > > > > > > > > this
> >> > > > > > > > > > > > seems
> >> > > > > > > > > > > > > > not
> >> > > > > > > > > > > > > > > very difficult since today we are already
> >> > > collecting
> >> > > > > the
> >> > > > > > > > > "thread
> >> > > > > > > > > > > pool
> >> > > > > > > > > > > > > > > utilization" metrics, which is a single
> >> > percentage
> >> > > > > > > > > > > > "aggregateIdleMeter"
> >> > > > > > > > > > > > > > > value; but we are already effectively
> >> aggregating
> >> > > it
> >> > > > > for
> >> > > > > > > each
> >> > > > > > > > > > > > requests
> >> > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > KafkaRequestHandler, and we can just extend
> >> it by
> >> > > > > > recording
> >> > > > > > > > the
> >> > > > > > > > > > > > source
> >> > > > > > > > > > > > > > > client id when handling them and aggregating
> >> by
> >> > > > > clientId
> >> > > > > > as
> >> > > > > > > > > well
> >> > > > > > > > > > as
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > total aggregate.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Guozhang
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps <
> >> > > > > > > j...@confluent.io
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hey Becket/Rajini,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > When I thought about it more deeply I came
> >> > around
> >> > > > to
> >> > > > > > the
> >> > > > > > > > > > "percent
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > processing time" metric too. It seems a
> lot
> >> > > closer
> >> > > > to
> >> > > > > > the
> >> > > > > > > > > thing
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > > > > actually
> >> > > > > > > > > > > > > > > > care about and need to protect. I also
> think
> >> > this
> >> > > > > would
> >> > > > > > > be
> >> > > > > > > > a
> >> > > > > > > > > > very
> >> > > > > > > > > > > > > > useful
> >> > > > > > > > > > > > > > > > metric even in the absence of throttling
> >> just
> >> > to
> >> > > > > debug
> >> > > > > > > > whose
> >> > > > > > > > > > > using
> >> > > > > > > > > > > > > > > > capacity.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Two problems to consider:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >    1. I agree that for the user it is
> >> > > > understandable
> >> > > > > > what
> >> > > > > > > > > lead
> >> > > > > > > > > > to
> >> > > > > > > > > > > > > their
> >> > > > > > > > > > > > > > > >    being throttled, but it is a bit hard
> to
> >> > > figure
> >> > > > > out
> >> > > > > > > the
> >> > > > > > > > > safe
> >> > > > > > > > > > > > range
> >> > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > >    them. i.e. if I have a new app that
> will
> >> > send
> >> > > > 200
> >> > > > > > > > > > > messages/sec I
> >> > > > > > > > > > > > > can
> >> > > > > > > > > > > > > > > >    probably reason that I'll be under the
> >> > > > throttling
> >> > > > > > > limit
> >> > > > > > > > of
> >> > > > > > > > > > 300
> >> > > > > > > > > > > > > > > req/sec.
> >> > > > > > > > > > > > > > > >    However if I need to be under a 10% CPU
> >> > > > resources
> >> > > > > > > limit
> >> > > > > > > > it
> >> > > > > > > > > > may
> >> > > > > > > > > > > > be
> >> > > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > bit
> >> > > > > > > > > > > > > > > >    harder for me to know a priori if i
> will
> >> or
> >> > > > won't.
> >> > > > > > > > > > > > > > > >    2. Calculating the available CPU time
> is
> >> a
> >> > bit
> >> > > > > > > difficult
> >> > > > > > > > > > since
> >> > > > > > > > > > > > > there
> >> > > > > > > > > > > > > > > are
> >> > > > > > > > > > > > > > > >    actually two thread pools--the I/O
> >> threads
> >> > and
> >> > > > the
> >> > > > > > > > network
> >> > > > > > > > > > > > > threads.
> >> > > > > > > > > > > > > > I
> >> > > > > > > > > > > > > > > > think
> >> > > > > > > > > > > > > > > >    it might be workable to count just the
> >> I/O
> >> > > > thread
> >> > > > > > time
> >> > > > > > > > as
> >> > > > > > > > > in
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > proposal,
> >> > > > > > > > > > > > > > > >    but the network thread work is actually
> >> > > > > non-trivial
> >> > > > > > > > (e.g.
> >> > > > > > > > > > all
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > disk
> >> > > > > > > > > > > > > > > >    reads for fetches happen in that
> >> thread). If
> >> > > you
> >> > > > > > count
> >> > > > > > > > > both
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > network
> >> > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > >    I/O threads it can skew things a bit.
> >> E.g.
> >> > say
> >> > > > you
> >> > > > > > > have
> >> > > > > > > > 50
> >> > > > > > > > > > > > network
> >> > > > > > > > > > > > > > > > threads,
> >> > > > > > > > > > > > > > > >    10 I/O threads, and 8 cores, what is
> the
> >> > > > available
> >> > > > > > cpu
> >> > > > > > > > > time
> >> > > > > > > > > > > > > > available
> >> > > > > > > > > > > > > > > > in a
> >> > > > > > > > > > > > > > > >    second? I suppose this is a problem
> >> whenever
> >> > > you
> >> > > > > > have
> >> > > > > > > a
> >> > > > > > > > > > > > bottleneck
> >> > > > > > > > > > > > > > > > between
> >> > > > > > > > > > > > > > > >    I/O and network threads or if you end
> up
> >> > > > > > significantly
> >> > > > > > > > > > > > > > > over-provisioning
> >> > > > > > > > > > > > > > > >    one pool (both of which are hard to
> >> avoid).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > An alternative for CPU throttling would be
> >> to
> >> > use
> >> > > > > this
> >> > > > > > > api:
> >> > > > > > > > > > > > > > > > http://docs.oracle.com/javase/
> >> > > > > > 1.5.0/docs/api/java/lang/
> >> > > > > > > > > > > > > > > > management/ThreadMXBean.html#
> >> > > > getThreadCpuTime(long)
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > That would let you track actual CPU usage
> >> > across
> >> > > > the
> >> > > > > > > > network,
> >> > > > > > > > > > I/O
> >> > > > > > > > > > > > > > > threads,
> >> > > > > > > > > > > > > > > > and purgatory threads and look at it as a
> >> > > > percentage
> >> > > > > of
> >> > > > > > > > total
> >> > > > > > > > > > > > cores.
> >> > > > > > > > > > > > > I
> >> > > > > > > > > > > > > > > > think this fixes many problems in the
> >> > reliability
> >> > > > of
> >> > > > > > the
> >> > > > > > > > > > metric.
> >> > > > > > > > > > > > It's
> >> > > > > > > > > > > > > > > > meaning is slightly different as it is
> just
> >> CPU
> >> > > > (you
> >> > > > > > > don't
> >> > > > > > > > > get
> >> > > > > > > > > > > > > charged
> >> > > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > time blocking on I/O) but that may be okay
> >> > > because
> >> > > > we
> >> > > > > > > > already
> >> > > > > > > > > > > have
> >> > > > > > > > > > > > a
> >> > > > > > > > > > > > > > > > throttle on I/O. The downside is I think
> it
> >> is
> >> > > > > possible
> >> > > > > > > > this
> >> > > > > > > > > > api
> >> > > > > > > > > > > > can
> >> > > > > > > > > > > > > be
> >> > > > > > > > > > > > > > > > disabled or isn't always available and it
> >> may
> >> > > also
> >> > > > be
> >> > > > > > > > > expensive
> >> > > > > > > > > > > > (also
> >> > > > > > > > > > > > > > > I've
> >> > > > > > > > > > > > > > > > never used it so not sure if it really
> works
> >> > the
> >> > > > way
> >> > > > > i
> >> > > > > > > > > think).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > -Jay
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, Becket
> Qin
> >> <
> >> > > > > > > > > > > becket....@gmail.com>
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > If the purpose of the KIP is only to
> >> protect
> >> > > the
> >> > > > > > > cluster
> >> > > > > > > > > from
> >> > > > > > > > > > > > being
> >> > > > > > > > > > > > > > > > > overwhelmed by crazy clients and is not
> >> > > intended
> >> > > > to
> >> > > > > > > > address
> >> > > > > > > > > > > > > resource
> >> > > > > > > > > > > > > > > > > allocation problem among the clients, I
> am
> >> > > > > wondering
> >> > > > > > if
> >> > > > > > > > > using
> >> > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > handling time quota (CPU time quota) is
> a
> >> > > better
> >> > > > > > > option.
> >> > > > > > > > > Here
> >> > > > > > > > > > > are
> >> > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > reasons:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 1. request handling time quota has
> better
> >> > > > > protection.
> >> > > > > > > Say
> >> > > > > > > > > we
> >> > > > > > > > > > > have
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > rate quota and set that to some value
> like
> >> > 100
> >> > > > > > > > > requests/sec,
> >> > > > > > > > > > it
> >> > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > possible
> >> > > > > > > > > > > > > > > > > that some of the requests are very
> >> expensive
> >> > > > > actually
> >> > > > > > > > take
> >> > > > > > > > > a
> >> > > > > > > > > > > lot
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > time
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > handle. In that case a few clients may
> >> still
> >> > > > > occupy a
> >> > > > > > > lot
> >> > > > > > > > > of
> >> > > > > > > > > > > CPU
> >> > > > > > > > > > > > > time
> >> > > > > > > > > > > > > > > > even
> >> > > > > > > > > > > > > > > > > the request rate is low. Arguably we can
> >> > > > carefully
> >> > > > > > set
> >> > > > > > > > > > request
> >> > > > > > > > > > > > rate
> >> > > > > > > > > > > > > > > quota
> >> > > > > > > > > > > > > > > > > for each request and client id
> >> combination,
> >> > but
> >> > > > it
> >> > > > > > > could
> >> > > > > > > > > > still
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > > > > tricky
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > get it right for everyone.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > If we use the request time handling
> >> quota, we
> >> > > can
> >> > > > > > > simply
> >> > > > > > > > > say
> >> > > > > > > > > > no
> >> > > > > > > > > > > > > > clients
> >> > > > > > > > > > > > > > > > can
> >> > > > > > > > > > > > > > > > > take up to more than 30% of the total
> >> request
> >> > > > > > handling
> >> > > > > > > > > > capacity
> >> > > > > > > > > > > > > > > (measured
> >> > > > > > > > > > > > > > > > > by time), regardless of the difference
> >> among
> >> > > > > > different
> >> > > > > > > > > > requests
> >> > > > > > > > > > > > or
> >> > > > > > > > > > > > > > what
> >> > > > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > the client doing. In this case maybe we
> >> can
> >> > > quota
> >> > > > > all
> >> > > > > > > the
> >> > > > > > > > > > > > requests
> >> > > > > > > > > > > > > if
> >> > > > > > > > > > > > > > > we
> >> > > > > > > > > > > > > > > > > want to.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > 2. The main benefit of using request
> rate
> >> > limit
> >> > > > is
> >> > > > > > that
> >> > > > > > > > it
> >> > > > > > > > > > > seems
> >> > > > > > > > > > > > > more
> >> > > > > > > > > > > > > > > > > intuitive. It is true that it is
> probably
> >> > > easier
> >> > > > to
> >> > > > > > > > explain
> >> > > > > > > > > > to
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > > user
> >> > > > > > > > > > > > > > > > > what does that mean. However, in
> practice
> >> it
> >> > > > looks
> >> > > > > > the
> >> > > > > > > > > impact
> >> > > > > > > > > > > of
> >> > > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > rate quota is not more quantifiable than
> >> the
> >> > > > > request
> >> > > > > > > > > handling
> >> > > > > > > > > > > > time
> >> > > > > > > > > > > > > > > quota.
> >> > > > > > > > > > > > > > > > > Unlike the byte rate quota, it is still
> >> > > difficult
> >> > > > > to
> >> > > > > > > > give a
> >> > > > > > > > > > > > number
> >> > > > > > > > > > > > > > > about
> >> > > > > > > > > > > > > > > > > impact of throughput or latency when a
> >> > request
> >> > > > rate
> >> > > > > > > quota
> >> > > > > > > > > is
> >> > > > > > > > > > > hit.
> >> > > > > > > > > > > > > So
> >> > > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > not better than the request handling
> time
> >> > > quota.
> >> > > > In
> >> > > > > > > fact
> >> > > > > > > > I
> >> > > > > > > > > > feel
> >> > > > > > > > > > > > it
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > clearer to tell user that "you are
> limited
> >> > > > because
> >> > > > > > you
> >> > > > > > > > have
> >> > > > > > > > > > > taken
> >> > > > > > > > > > > > > 30%
> >> > > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > > > the CPU time on the broker" than
> otherwise
> >> > > > > something
> >> > > > > > > like
> >> > > > > > > > > > "your
> >> > > > > > > > > > > > > > request
> >> > > > > > > > > > > > > > > > > rate quota on metadata request has
> >> reached".
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay
> >> Kreps <
> >> > > > > > > > > j...@confluent.io
> >> > > > > > > > > > >
> >> > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I think this proposal makes a lot of
> >> sense
> >> > > > > > > (especially
> >> > > > > > > > > now
> >> > > > > > > > > > > that
> >> > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > > > > oriented around request rate) and
> fills
> >> the
> >> > > > > biggest
> >> > > > > > > > > > remaining
> >> > > > > > > > > > > > gap
> >> > > > > > > > > > > > > > in
> >> > > > > > > > > > > > > > > > the
> >> > > > > > > > > > > > > > > > > > multi-tenancy story.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I think for intra-cluster
> communication
> >> > > > > > (StopReplica,
> >> > > > > > > > > etc)
> >> > > > > > > > > > we
> >> > > > > > > > > > > > > could
> >> > > > > > > > > > > > > > > > avoid
> >> > > > > > > > > > > > > > > > > > throttling entirely. You can secure or
> >> > > > otherwise
> >> > > > > > > > > lock-down
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > > > > cluster
> >> > > > > > > > > > > > > > > > > > communication to avoid any
> unauthorized
> >> > > > external
> >> > > > > > > party
> >> > > > > > > > > from
> >> > > > > > > > > > > > > trying
> >> > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > initiate these requests. As a result
> we
> >> are
> >> > > as
> >> > > > > > likely
> >> > > > > > > > to
> >> > > > > > > > > > > cause
> >> > > > > > > > > > > > > > > problems
> >> > > > > > > > > > > > > > > > > as
> >> > > > > > > > > > > > > > > > > > solve them by throttling these, right?
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > I'm not so sure that we should exempt
> >> the
> >> > > > > consumer
> >> > > > > > > > > requests
> >> > > > > > > > > > > > such
> >> > > > > > > > > > > > > as
> >> > > > > > > > > > > > > > > > > > heartbeat. It's true that if we
> >> throttle an
> >> > > > app's
> >> > > > > > > > > heartbeat
> >> > > > > > > > > > > > > > requests
> >> > > > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > > > > may
> >> > > > > > > > > > > > > > > > > > cause it to fall out of its consumer
> >> group.
> >> > > > > However
> >> > > > > > > if
> >> > > > > > > > we
> >> > > > > > > > > > > don't
> >> > > > > > > > > > > > > > > > throttle
> >> > > > > > > > > > > > > > > > > it
> >> > > > > > > > > > > > > > > > > > it may DDOS the cluster if the
> heartbeat
> >> > > > interval
> >> > > > > > is
> >> > > > > > > > set
> >> > > > > > > > > > > > > > incorrectly
> >> > > > > > > > > > > > > > > or
> >> > > > > > > > > > > > > > > > > if
> >> > > > > > > > > > > > > > > > > > some client in some language has a
> bug.
> >> I
> >> > > think
> >> > > > > the
> >> > > > > > > > > policy
> >> > > > > > > > > > > with
> >> > > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > > kind
> >> > > > > > > > > > > > > > > > > > of throttling is to protect the
> cluster
> >> > above
> >> > > > any
> >> > > > > > > > > > individual
> >> > > > > > > > > > > > app,
> >> > > > > > > > > > > > > > > > right?
> >> > > > > > > > > > > > > > > > > I
> >> > > > > > > > > > > > > > > > > > think in general this should be okay
> >> since
> >> > > for
> >> > > > > most
> >> > > > > > > > > > > deployments
> >> > > > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > > > > setting is meant as more of a safety
> >> > > > valve---that
> >> > > > > > is
> >> > > > > > > > > rather
> >> > > > > > > > > > > > than
> >> > > > > > > > > > > > > > set
> >> > > > > > > > > > > > > > > > > > something very close to what you
> expect
> >> to
> >> > > need
> >> > > > > > (say
> >> > > > > > > 2
> >> > > > > > > > > > > req/sec
> >> > > > > > > > > > > > or
> >> > > > > > > > > > > > > > > > > whatever)
> >> > > > > > > > > > > > > > > > > > you would have something quite high
> >> (like
> >> > 100
> >> > > > > > > req/sec)
> >> > > > > > > > > with
> >> > > > > > > > > > > > this
> >> > > > > > > > > > > > > > > meant
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > prevent a client gone crazy. I think
> >> when
> >> > > used
> >> > > > > this
> >> > > > > > > way
> >> > > > > > > > > > > > allowing
> >> > > > > > > > > > > > > > > those
> >> > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > > > be throttled would actually provide
> >> > > meaningful
> >> > > > > > > > > protection.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > -Jay
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 AM,
> Rajini
> >> > > > Sivaram <
> >> > > > > > > > > > > > > > > > rajinisiva...@gmail.com
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Hi all,
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > I have just created KIP-124 to
> >> introduce
> >> > > > > request
> >> > > > > > > rate
> >> > > > > > > > > > > quotas
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > > Kafka:
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> >> > > > > > > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > > > > > > > > > > 124+-+Request+rate+quotas
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > The proposal is for a simple
> >> percentage
> >> > > > request
> >> > > > > > > > > handling
> >> > > > > > > > > > > time
> >> > > > > > > > > > > > > > quota
> >> > > > > > > > > > > > > > > > > that
> >> > > > > > > > > > > > > > > > > > > can be allocated to *<client-id>*,
> >> > *<user>*
> >> > > > or
> >> > > > > > > > *<user,
> >> > > > > > > > > > > > > > client-id>*.
> >> > > > > > > > > > > > > > > > > There
> >> > > > > > > > > > > > > > > > > > > are a few other suggestions also
> under
> >> > > > > "Rejected
> >> > > > > > > > > > > > alternatives".
> >> > > > > > > > > > > > > > > > > Feedback
> >> > > > > > > > > > > > > > > > > > > and suggestions are welcome.
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Thank you...
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Regards,
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > > Rajini
> >> > > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > -- Guozhang
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > -- Guozhang
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [DISCUSS] KIP-124: Request rate quotas

Reply via email to