Thank you all for the feedback.

Jay: I have removed exemption for consumer heartbeat etc. Agree that
protecting the cluster is more important than protecting individual apps.
Have retained the exemption for StopReplicat/LeaderAndIsr etc, these are
throttled only if authorization fails (so can't be used for DoS attacks in
a secure cluster, but allows inter-broker requests to complete without
delays).

I will wait another day to see if these is any objection to quotas based on
request processing time (as opposed to request rate) and if there are no
objections, I will revert to the original proposal with some changes.

The original proposal was only including the time used by the request
handler threads (that made calculation easy). I think the suggestion is to
include the time spent in the network threads as well since that may be
significant. As Jay pointed out, it is more complicated to calculate the
total available CPU time and convert to a ratio when there *m* I/O threads
and *n* network threads. ThreadMXBean#getThreadCPUTime() may give us what
we want, but it can be very expensive on some platforms. As Becket and
Guozhang have pointed out, we do have several time measurements already for
generating metrics that we could use, though we might want to switch to
nanoTime() instead of currentTimeMillis() since some of the values for
small requests may be < 1ms. But rather than add up the time spent in I/O
thread and network thread, wouldn't it be better to convert the time spent
on each thread into a separate ratio? UserA has a request quota of 5%. Can
we take that to mean that UserA can use 5% of the time on network threads
and 5% of the time on I/O threads? If either is exceeded, the response is
throttled - it would mean maintaining two sets of metrics for the two
durations, but would result in more meaningful ratios. We could define two
quota limits (UserA has 5% of request threads and 10% of network threads),
but that seems unnecessary and harder to explain to users.

Back to why and how quotas are applied to network thread utilization:
a) In the case of fetch,  the time spent in the network thread may be
significant and I can see the need to include this. Are there other
requests where the network thread utilization is significant? In the case
of fetch, request handler thread utilization would throttle clients with
high request rate, low data volume and fetch byte rate quota will throttle
clients with high data volume. Network thread utilization is perhaps
proportional to the data volume. I am wondering if we even need to throttle
based on network thread utilization or whether the data volume quota covers
this case.

b) At the moment, we record and check for quota violation at the same time.
If a quota is violated, the response is delayed. Using Jay'e example of
disk reads for fetches happening in the network thread, We can't record and
delay a response after the disk reads. We could record the time spent on
the network thread when the response is complete and introduce a delay for
handling a subsequent request (separate out recording and quota violation
handling in the case of network thread overload). Does that make sense?


Regards,

Rajini


On Tue, Feb 21, 2017 at 2:58 AM, Becket Qin <becket....@gmail.com> wrote:

> Hey Jay,
>
> Yeah, I agree that enforcing the CPU time is a little tricky. I am thinking
> that maybe we can use the existing request statistics. They are already
> very detailed so we can probably see the approximate CPU time from it, e.g.
> something like (total_time - request/response_queue_time - remote_time).
>
> I agree with Guozhang that when a user is throttled it is likely that we
> need to see if anything has went wrong first, and if the users are well
> behaving and just need more resources, we will have to bump up the quota
> for them. It is true that pre-allocating CPU time quota precisely for the
> users is difficult. So in practice it would probably be more like first set
> a relative high protective CPU time quota for everyone and increase that
> for some individual clients on demand.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Feb 20, 2017 at 5:48 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > This is a great proposal, glad to see it happening.
> >
> > I am inclined to the CPU throttling, or more specifically processing time
> > ratio instead of the request rate throttling as well. Becket has very
> well
> > summed my rationales above, and one thing to add here is that the former
> > has a good support for both "protecting against rogue clients" as well as
> > "utilizing a cluster for multi-tenancy usage": when thinking about how to
> > explain this to the end users, I find it actually more natural than the
> > request rate since as mentioned above, different requests will have quite
> > different "cost", and Kafka today already have various request types
> > (produce, fetch, admin, metadata, etc), because of that the request rate
> > throttling may not be as effective unless it is set very conservatively.
> >
> > Regarding to user reactions when they are throttled, I think it may
> differ
> > case-by-case, and need to be discovered / guided by looking at relative
> > metrics. So in other words users would not expect to get additional
> > information by simply being told "hey, you are throttled", which is all
> > what throttling does; they need to take a follow-up step and see "hmm,
> I'm
> > throttled probably because of ..", which is by looking at other metric
> > values: e.g. whether I'm bombarding the brokers with metadata request,
> > which are usually cheap to handle but I'm sending thousands per second;
> or
> > is it because I'm catching up and hence sending very heavy fetching
> request
> > with large min.bytes, etc.
> >
> > Regarding to the implementation, as once discussed with Jun, this seems
> not
> > very difficult since today we are already collecting the "thread pool
> > utilization" metrics, which is a single percentage "aggregateIdleMeter"
> > value; but we are already effectively aggregating it for each requests in
> > KafkaRequestHandler, and we can just extend it by recording the source
> > client id when handling them and aggregating by clientId as well as the
> > total aggregate.
> >
> >
> > Guozhang
> >
> >
> >
> >
> > On Mon, Feb 20, 2017 at 4:27 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > > Hey Becket/Rajini,
> > >
> > > When I thought about it more deeply I came around to the "percent of
> > > processing time" metric too. It seems a lot closer to the thing we
> > actually
> > > care about and need to protect. I also think this would be a very
> useful
> > > metric even in the absence of throttling just to debug whose using
> > > capacity.
> > >
> > > Two problems to consider:
> > >
> > >    1. I agree that for the user it is understandable what lead to their
> > >    being throttled, but it is a bit hard to figure out the safe range
> for
> > >    them. i.e. if I have a new app that will send 200 messages/sec I can
> > >    probably reason that I'll be under the throttling limit of 300
> > req/sec.
> > >    However if I need to be under a 10% CPU resources limit it may be a
> > bit
> > >    harder for me to know a priori if i will or won't.
> > >    2. Calculating the available CPU time is a bit difficult since there
> > are
> > >    actually two thread pools--the I/O threads and the network threads.
> I
> > > think
> > >    it might be workable to count just the I/O thread time as in the
> > > proposal,
> > >    but the network thread work is actually non-trivial (e.g. all the
> disk
> > >    reads for fetches happen in that thread). If you count both the
> > network
> > > and
> > >    I/O threads it can skew things a bit. E.g. say you have 50 network
> > > threads,
> > >    10 I/O threads, and 8 cores, what is the available cpu time
> available
> > > in a
> > >    second? I suppose this is a problem whenever you have a bottleneck
> > > between
> > >    I/O and network threads or if you end up significantly
> > over-provisioning
> > >    one pool (both of which are hard to avoid).
> > >
> > > An alternative for CPU throttling would be to use this api:
> > > http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/
> > > management/ThreadMXBean.html#getThreadCpuTime(long)
> > >
> > > That would let you track actual CPU usage across the network, I/O
> > threads,
> > > and purgatory threads and look at it as a percentage of total cores. I
> > > think this fixes many problems in the reliability of the metric. It's
> > > meaning is slightly different as it is just CPU (you don't get charged
> > for
> > > time blocking on I/O) but that may be okay because we already have a
> > > throttle on I/O. The downside is I think it is possible this api can be
> > > disabled or isn't always available and it may also be expensive (also
> > I've
> > > never used it so not sure if it really works the way i think).
> > >
> > > -Jay
> > >
> > > On Mon, Feb 20, 2017 at 3:17 PM, Becket Qin <becket....@gmail.com>
> > wrote:
> > >
> > > > If the purpose of the KIP is only to protect the cluster from being
> > > > overwhelmed by crazy clients and is not intended to address resource
> > > > allocation problem among the clients, I am wondering if using request
> > > > handling time quota (CPU time quota) is a better option. Here are the
> > > > reasons:
> > > >
> > > > 1. request handling time quota has better protection. Say we have
> > request
> > > > rate quota and set that to some value like 100 requests/sec, it is
> > > possible
> > > > that some of the requests are very expensive actually take a lot of
> > time
> > > to
> > > > handle. In that case a few clients may still occupy a lot of CPU time
> > > even
> > > > the request rate is low. Arguably we can carefully set request rate
> > quota
> > > > for each request and client id combination, but it could still be
> > tricky
> > > to
> > > > get it right for everyone.
> > > >
> > > > If we use the request time handling quota, we can simply say no
> clients
> > > can
> > > > take up to more than 30% of the total request handling capacity
> > (measured
> > > > by time), regardless of the difference among different requests or
> what
> > > is
> > > > the client doing. In this case maybe we can quota all the requests if
> > we
> > > > want to.
> > > >
> > > > 2. The main benefit of using request rate limit is that it seems more
> > > > intuitive. It is true that it is probably easier to explain to the
> user
> > > > what does that mean. However, in practice it looks the impact of
> > request
> > > > rate quota is not more quantifiable than the request handling time
> > quota.
> > > > Unlike the byte rate quota, it is still difficult to give a number
> > about
> > > > impact of throughput or latency when a request rate quota is hit. So
> it
> > > is
> > > > not better than the request handling time quota. In fact I feel it is
> > > > clearer to tell user that "you are limited because you have taken 30%
> > of
> > > > the CPU time on the broker" than otherwise something like "your
> request
> > > > rate quota on metadata request has reached".
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > > On Mon, Feb 20, 2017 at 2:23 PM, Jay Kreps <j...@confluent.io> wrote:
> > > >
> > > > > I think this proposal makes a lot of sense (especially now that it
> is
> > > > > oriented around request rate) and fills the biggest remaining gap
> in
> > > the
> > > > > multi-tenancy story.
> > > > >
> > > > > I think for intra-cluster communication (StopReplica, etc) we could
> > > avoid
> > > > > throttling entirely. You can secure or otherwise lock-down the
> > cluster
> > > > > communication to avoid any unauthorized external party from trying
> to
> > > > > initiate these requests. As a result we are as likely to cause
> > problems
> > > > as
> > > > > solve them by throttling these, right?
> > > > >
> > > > > I'm not so sure that we should exempt the consumer requests such as
> > > > > heartbeat. It's true that if we throttle an app's heartbeat
> requests
> > it
> > > > may
> > > > > cause it to fall out of its consumer group. However if we don't
> > > throttle
> > > > it
> > > > > it may DDOS the cluster if the heartbeat interval is set
> incorrectly
> > or
> > > > if
> > > > > some client in some language has a bug. I think the policy with
> this
> > > kind
> > > > > of throttling is to protect the cluster above any individual app,
> > > right?
> > > > I
> > > > > think in general this should be okay since for most deployments
> this
> > > > > setting is meant as more of a safety valve---that is rather than
> set
> > > > > something very close to what you expect to need (say 2 req/sec or
> > > > whatever)
> > > > > you would have something quite high (like 100 req/sec) with this
> > meant
> > > to
> > > > > prevent a client gone crazy. I think when used this way allowing
> > those
> > > to
> > > > > be throttled would actually provide meaningful protection.
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram <
> > > rajinisiva...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I have just created KIP-124 to introduce request rate quotas to
> > > Kafka:
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 124+-+Request+rate+quotas
> > > > > >
> > > > > > The proposal is for a simple percentage request handling time
> quota
> > > > that
> > > > > > can be allocated to *<client-id>*, *<user>* or *<user,
> client-id>*.
> > > > There
> > > > > > are a few other suggestions also under "Rejected alternatives".
> > > > Feedback
> > > > > > and suggestions are welcome.
> > > > > >
> > > > > > Thank you...
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Reply via email to