[
https://issues.apache.org/jira/browse/KAFKA-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586842#comment-13586842
]
Jonathan Creasy commented on KAFKA-656:
---------------------------------------
Kafka is using the Yammer/Coda Hale Metrics library now right?
Would it be sufficient to track the three quantities by topic and client ID and
take action if the 1/5/15-min load for that metric exceeded the thresholds
defined? That is an EWMA so it would rise and taper off over time.
Perhaps we could use an exponential back-off so that if you exceeded it once it
would recover quickly and then after that take longer too cool-off before
allowing the client again.
> Add Quotas to Kafka
> -------------------
>
> Key: KAFKA-656
> URL: https://issues.apache.org/jira/browse/KAFKA-656
> Project: Kafka
> Issue Type: New Feature
> Components: core
> Affects Versions: 0.8.1
> Reporter: Jay Kreps
> Labels: project
>
> It would be nice to implement a quota system in Kafka to improve our support
> for highly multi-tenant usage. The goal of this system would be to prevent
> one naughty user from accidently overloading the whole cluster.
> There are several quantities we would want to track:
> 1. Requests pers second
> 2. Bytes written per second
> 3. Bytes read per second
> There are two reasonable groupings we would want to aggregate and enforce
> these thresholds at:
> 1. Topic level
> 2. Client level (e.g. by client id from the request)
> When a request hits one of these limits we will simply reject it with a
> QUOTA_EXCEEDED exception.
> To avoid suddenly breaking things without warning, we should ideally support
> two thresholds: a soft threshold at which we produce some kind of warning and
> a hard threshold at which we give the error. The soft threshold could just be
> defined as 80% (or whatever) of the hard threshold.
> There are nuances to getting this right. If you measure second-by-second a
> single burst may exceed the threshold, so we need a sustained measurement
> over a period of time.
> Likewise when do we stop giving this error? To make this work right we likely
> need to charge against the quota for request *attempts* not just successful
> requests. Otherwise a client that is overloading the server will just flap on
> and off--i.e. we would disable them for a period of time but when we
> re-enabled them they would likely still be abusing us.
> It would be good to a wiki design on how this would all work as a starting
> point for discussion.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira