I noticed that the throttle_time_ms added to all the message responses is in milliseconds. Does it make sense to express this in microseconds in case we start doing more fine-grained CPU throttling later on? An int32 should still be more than enough if using microseconds.
best, Colin On Fri, Feb 24, 2017, at 10:31, Jun Rao wrote: > Hi, Jay, > > 2. Regarding request.unit vs request.percentage. I started with > request.percentage too. The reasoning for request.unit is the following. > Suppose that the capacity has been reached on a broker and the admin > needs > to add a new user. A simple way to increase the capacity is to increase > the > number of io threads, assuming there are still enough cores. If the limit > is based on percentage, the additional capacity automatically gets > distributed to existing users and we haven't really carved out any > additional resource for the new user. Now, is it easy for a user to > reason > about 0.1 unit vs 10%. My feeling is that both are hard and have to be > configured empirically. Not sure if percentage is obviously easier to > reason about. > > Thanks, > > Jun > > On Fri, Feb 24, 2017 at 8:10 AM, Jay Kreps <j...@confluent.io> wrote: > > > A couple of quick points: > > > > 1. Even though the implementation of this quota is only using io thread > > time, i think we should call it something like "request-time". This will > > give us flexibility to improve the implementation to cover network threads > > in the future and will avoid exposing internal details like our thread > > pools on the server. > > > > 2. Jun/Roger, I get what you are trying to fix but the idea of thread/units > > is super unintuitive as a user-facing knob. I had to read the KIP like > > eight times to understand this. I'm not sure that your point that > > increasing the number of threads is a problem with a percentage-based > > value, it really depends on whether the user thinks about the "percentage > > of request processing time" or "thread units". If they think "I have > > allocated 10% of my request processing time to user x" then it is a bug > > that increasing the thread count decreases that percent as it does in the > > current proposal. As a practical matter I think the only way to actually > > reason about this is as a percent---I just don't believe people are going > > to think, "ah, 4.3 thread units, that is the right amount!". Instead I > > think they have to understand this thread unit concept, figure out what > > they have set in number of threads, compute a percent and then come up with > > the number of thread units, and these will all be wrong if that thread > > count changes. I also think this ties us to throttling the I/O thread pool, > > which may not be where we want to end up. > > > > 3. For what it's worth I do think having a single throttle_ms field in all > > the responses that combines all throttling from all quotas is probably the > > simplest. There could be a use case for having separate fields for each, > > but I think that is actually harder to use/monitor in the common case so > > unless someone has a use case I think just one should be fine. > > > > -Jay > > > > On Fri, Feb 24, 2017 at 4:21 AM, Rajini Sivaram <rajinisiva...@gmail.com> > > wrote: > > > > > I have updated the KIP based on the discussions so far. > > > > > > > > > Regards, > > > > > > Rajini > > > > > > On Thu, Feb 23, 2017 at 11:29 PM, Rajini Sivaram < > > rajinisiva...@gmail.com> > > > wrote: > > > > > > > Thank you all for the feedback. > > > > > > > > Ismael #1. It makes sense not to throttle inter-broker requests like > > > > LeaderAndIsr etc. The simplest way to ensure that clients cannot use > > > these > > > > requests to bypass quotas for DoS attacks is to ensure that ACLs > > prevent > > > > clients from using these requests and unauthorized requests are > > included > > > > towards quotas. > > > > > > > > Ismael #2, Jay #1 : I was thinking that these quotas can return a > > > separate > > > > throttle time, and all utilization based quotas could use the same > > field > > > > (we won't add another one for network thread utilization for instance). > > > But > > > > perhaps it makes sense to keep byte rate quotas separate in > > produce/fetch > > > > responses to provide separate metrics? Agree with Ismael that the name > > of > > > > the existing field should be changed if we have two. Happy to switch > > to a > > > > single combined throttle time if that is sufficient. > > > > > > > > Ismael #4, #5, #6: Will update KIP. Will use dot separated name for new > > > > property. Replication quotas use dot separated, so it will be > > consistent > > > > with all properties except byte rate quotas. > > > > > > > > Radai: #1 Request processing time rather than request rate were chosen > > > > because the time per request can vary significantly between requests as > > > > mentioned in the discussion and KIP. > > > > #2 Two separate quotas for heartbeats/regular requests feel like more > > > > configuration and more metrics. Since most users would set quotas > > higher > > > > than the expected usage and quotas are more of a safety net, a single > > > quota > > > > should work in most cases. > > > > #3 The number of requests in purgatory is limited by the number of > > > active > > > > connections since only one request per connection will be throttled at > > a > > > > time. > > > > #4 As with byte rate quotas, to use the full allocated quotas, > > > > clients/users would need to use partitions that are distributed across > > > the > > > > cluster. The alternative of using cluster-wide quotas instead of > > > per-broker > > > > quotas would be far too complex to implement. > > > > > > > > Dong : We currently have two ClientQuotaManagers for quota types Fetch > > > and > > > > Produce. A new one will be added for IOThread, which manages quotas for > > > I/O > > > > thread utilization. This will not update the Fetch or Produce > > queue-size, > > > > but will have a separate metric for the queue-size. I wasn't planning > > to > > > > add any additional metrics apart from the equivalent ones for existing > > > > quotas as part of this KIP. Ratio of byte-rate to I/O thread > > utilization > > > > could be slightly misleading since it depends on the sequence of > > > requests. > > > > But we can look into more metrics after the KIP is implemented if > > > required. > > > > > > > > I think we need to limit the maximum delay since all requests are > > > > throttled. If a client has a quota of 0.001 units and a single request > > > used > > > > 50ms, we don't want to delay all requests from the client by 50 > > seconds, > > > > throwing the client out of all its consumer groups. The issue is only > > if > > > a > > > > user is allocated a quota that is insufficient to process one large > > > > request. The expectation is that the units allocated per user will be > > > much > > > > higher than the time taken to process one request and the limit should > > > > seldom be applied. Agree this needs proper documentation. > > > > > > > > Regards, > > > > > > > > Rajini > > > > > > > > > > > > On Thu, Feb 23, 2017 at 8:04 PM, radai <radai.rosenbl...@gmail.com> > > > wrote: > > > > > > > >> @jun: i wasnt concerned about tying up a request processing thread, > > but > > > >> IIUC the code does still read the entire request out, which might > > add-up > > > >> to > > > >> a non-negligible amount of memory. > > > >> > > > >> On Thu, Feb 23, 2017 at 11:55 AM, Dong Lin <lindon...@gmail.com> > > wrote: > > > >> > > > >> > Hey Rajini, > > > >> > > > > >> > The current KIP says that the maximum delay will be reduced to > > window > > > >> size > > > >> > if it is larger than the window size. I have a concern with this: > > > >> > > > > >> > 1) This essentially means that the user is allowed to exceed their > > > quota > > > >> > over a long period of time. Can you provide an upper bound on this > > > >> > deviation? > > > >> > > > > >> > 2) What is the motivation for cap the maximum delay by the window > > > size? > > > >> I > > > >> > am wondering if there is better alternative to address the problem. > > > >> > > > > >> > 3) It means that the existing metric-related config will have a more > > > >> > directly impact on the mechanism of this io-thread-unit-based quota. > > > The > > > >> > may be an important change depending on the answer to 1) above. We > > > >> probably > > > >> > need to document this more explicitly. > > > >> > > > > >> > Dong > > > >> > > > > >> > > > > >> > On Thu, Feb 23, 2017 at 10:56 AM, Dong Lin <lindon...@gmail.com> > > > wrote: > > > >> > > > > >> > > Hey Jun, > > > >> > > > > > >> > > Yeah you are right. I thought it wasn't because at LinkedIn it > > will > > > be > > > >> > too > > > >> > > much pressure on inGraph to expose those per-clientId metrics so > > we > > > >> ended > > > >> > > up printing them periodically to local log. Never mind if it is > > not > > > a > > > >> > > general problem. > > > >> > > > > > >> > > Hey Rajini, > > > >> > > > > > >> > > - I agree with Jay that we probably don't want to add a new field > > > for > > > >> > > every quota ProduceResponse or FetchResponse. Is there any > > use-case > > > >> for > > > >> > > having separate throttle-time fields for byte-rate-quota and > > > >> > > io-thread-unit-quota? You probably need to document this as > > > interface > > > >> > > change if you plan to add new field in any request. > > > >> > > > > > >> > > - I don't think IOThread belongs to quotaType. The existing quota > > > >> types > > > >> > > (i.e. Produce/Fetch/LeaderReplication/FollowerReplication) > > identify > > > >> the > > > >> > > type of request that are throttled, not the quota mechanism that > > is > > > >> > applied. > > > >> > > > > > >> > > - If a request is throttled due to this io-thread-unit-based > > quota, > > > is > > > >> > the > > > >> > > existing queue-size metric in ClientQuotaManager incremented? > > > >> > > > > > >> > > - In the interest of providing guide line for admin to decide > > > >> > > io-thread-unit-based quota and for user to understand its impact > > on > > > >> their > > > >> > > traffic, would it be useful to have a metric that shows the > > overall > > > >> > > byte-rate per io-thread-unit? Can we also show this a per-clientId > > > >> > metric? > > > >> > > > > > >> > > Thanks, > > > >> > > Dong > > > >> > > > > > >> > > > > > >> > > On Thu, Feb 23, 2017 at 9:25 AM, Jun Rao <j...@confluent.io> > > wrote: > > > >> > > > > > >> > >> Hi, Ismael, > > > >> > >> > > > >> > >> For #3, typically, an admin won't configure more io threads than > > > CPU > > > >> > >> cores, > > > >> > >> but it's possible for an admin to start with fewer io threads > > than > > > >> cores > > > >> > >> and grow that later on. > > > >> > >> > > > >> > >> Hi, Dong, > > > >> > >> > > > >> > >> I think the throttleTime sensor on the broker tells the admin > > > >> whether a > > > >> > >> user/clentId is throttled or not. > > > >> > >> > > > >> > >> Hi, Radi, > > > >> > >> > > > >> > >> The reasoning for delaying the throttled requests on the broker > > > >> instead > > > >> > of > > > >> > >> returning an error immediately is that the latter has no way to > > > >> prevent > > > >> > >> the > > > >> > >> client from retrying immediately, which will make things worse. > > The > > > >> > >> delaying logic is based off a delay queue. A separate expiration > > > >> thread > > > >> > >> just waits on the next to be expired request. So, it doesn't tie > > > up a > > > >> > >> request handler thread. > > > >> > >> > > > >> > >> Thanks, > > > >> > >> > > > >> > >> Jun > > > >> > >> > > > >> > >> On Thu, Feb 23, 2017 at 9:07 AM, Ismael Juma <ism...@juma.me.uk> > > > >> wrote: > > > >> > >> > > > >> > >> > Hi Jay, > > > >> > >> > > > > >> > >> > Regarding 1, I definitely like the simplicity of keeping a > > single > > > >> > >> throttle > > > >> > >> > time field in the response. The downside is that the client > > > metrics > > > >> > >> will be > > > >> > >> > more coarse grained. > > > >> > >> > > > > >> > >> > Regarding 3, we have `leader.imbalance.per.broker.percentage` > > > and > > > >> > >> > `log.cleaner.min.cleanable.ratio`. > > > >> > >> > > > > >> > >> > Ismael > > > >> > >> > > > > >> > >> > On Thu, Feb 23, 2017 at 4:43 PM, Jay Kreps <j...@confluent.io> > > > >> wrote: > > > >> > >> > > > > >> > >> > > A few minor comments: > > > >> > >> > > > > > >> > >> > > 1. Isn't it the case that the throttling time response > > field > > > >> > should > > > >> > >> > have > > > >> > >> > > the total time your request was throttled irrespective of > > > the > > > >> > >> quotas > > > >> > >> > > that > > > >> > >> > > caused that. Limiting it to byte rate quota doesn't make > > > >> sense, > > > >> > >> but I > > > >> > >> > > also > > > >> > >> > > I don't think we want to end up adding new fields in the > > > >> response > > > >> > >> for > > > >> > >> > > every > > > >> > >> > > single thing we quota, right? > > > >> > >> > > 2. I don't think we should make this quota specifically > > > about > > > >> io > > > >> > >> > > threads. Once we introduce these quotas people set them > > and > > > >> > expect > > > >> > >> > them > > > >> > >> > > to > > > >> > >> > > be enforced (and if they aren't it may cause an outage). > > As > > > a > > > >> > >> result > > > >> > >> > > they > > > >> > >> > > are a bit more sensitive than normal configs, I think. The > > > >> > current > > > >> > >> > > thread > > > >> > >> > > pools seem like something of an implementation detail and > > > not > > > >> the > > > >> > >> > level > > > >> > >> > > the > > > >> > >> > > user-facing quotas should be involved with. I think it > > might > > > >> be > > > >> > >> better > > > >> > >> > > to > > > >> > >> > > make this a general request-time throttle with no mention > > in > > > >> the > > > >> > >> > naming > > > >> > >> > > about I/O threads and simply acknowledge the current > > > >> limitation > > > >> > >> (which > > > >> > >> > > we > > > >> > >> > > may someday fix) in the docs that this covers only the > > time > > > >> after > > > >> > >> the > > > >> > >> > > thread is read off the network. > > > >> > >> > > 3. As such I think the right interface to the user would > > be > > > >> > >> something > > > >> > >> > > like percent_request_time and be in {0,...100} or > > > >> > >> request_time_ratio > > > >> > >> > > and be > > > >> > >> > > in {0.0,...,1.0} (I think "ratio" is the terminology we > > used > > > >> if > > > >> > the > > > >> > >> > > scale > > > >> > >> > > is between 0 and 1 in the other metrics, right?) > > > >> > >> > > > > > >> > >> > > -Jay > > > >> > >> > > > > > >> > >> > > On Thu, Feb 23, 2017 at 3:45 AM, Rajini Sivaram < > > > >> > >> rajinisiva...@gmail.com > > > >> > >> > > > > > >> > >> > > wrote: > > > >> > >> > > > > > >> > >> > > > Guozhang/Dong, > > > >> > >> > > > > > > >> > >> > > > Thank you for the feedback. > > > >> > >> > > > > > > >> > >> > > > Guozhang : I have updated the section on co-existence of > > byte > > > >> rate > > > >> > >> and > > > >> > >> > > > request time quotas. > > > >> > >> > > > > > > >> > >> > > > Dong: I hadn't added much detail to the metrics and sensors > > > >> since > > > >> > >> they > > > >> > >> > > are > > > >> > >> > > > going to be very similar to the existing metrics and > > sensors. > > > >> To > > > >> > >> avoid > > > >> > >> > > > confusion, I have now added more detail. All metrics are in > > > the > > > >> > >> group > > > >> > >> > > > "quotaType" and all sensors have names starting with > > > >> "quotaType" > > > >> > >> (where > > > >> > >> > > > quotaType is Produce/Fetch/LeaderReplication/ > > > >> > >> > > > FollowerReplication/*IOThread*). > > > >> > >> > > > So there will be no reuse of existing metrics/sensors. The > > > new > > > >> > ones > > > >> > >> for > > > >> > >> > > > request processing time based throttling will be completely > > > >> > >> independent > > > >> > >> > > of > > > >> > >> > > > existing metrics/sensors, but will be consistent in format. > > > >> > >> > > > > > > >> > >> > > > The existing throttle_time_ms field in produce/fetch > > > responses > > > >> > will > > > >> > >> not > > > >> > >> > > be > > > >> > >> > > > impacted by this KIP. That will continue to return > > byte-rate > > > >> based > > > >> > >> > > > throttling times. In addition, a new field > > > >> > request_throttle_time_ms > > > >> > >> > will > > > >> > >> > > be > > > >> > >> > > > added to return request quota based throttling times. These > > > >> will > > > >> > be > > > >> > >> > > exposed > > > >> > >> > > > as new metrics on the client-side. > > > >> > >> > > > > > > >> > >> > > > Since all metrics and sensors are different for each type > > of > > > >> > quota, > > > >> > >> I > > > >> > >> > > > believe there is already sufficient metrics to monitor > > > >> throttling > > > >> > on > > > >> > >> > both > > > >> > >> > > > client and broker side for each type of throttling. > > > >> > >> > > > > > > >> > >> > > > Regards, > > > >> > >> > > > > > > >> > >> > > > Rajini > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > On Thu, Feb 23, 2017 at 4:32 AM, Dong Lin < > > > lindon...@gmail.com > > > >> > > > > >> > >> wrote: > > > >> > >> > > > > > > >> > >> > > > > Hey Rajini, > > > >> > >> > > > > > > > >> > >> > > > > I think it makes a lot of sense to use io_thread_units as > > > >> metric > > > >> > >> to > > > >> > >> > > quota > > > >> > >> > > > > user's traffic here. LGTM overall. I have some questions > > > >> > regarding > > > >> > >> > > > sensors. > > > >> > >> > > > > > > > >> > >> > > > > - Can you be more specific in the KIP what sensors will > > be > > > >> > added? > > > >> > >> For > > > >> > >> > > > > example, it will be useful to specify the name and > > > >> attributes of > > > >> > >> > these > > > >> > >> > > > new > > > >> > >> > > > > sensors. > > > >> > >> > > > > > > > >> > >> > > > > - We currently have throttle-time and queue-size for > > > >> byte-rate > > > >> > >> based > > > >> > >> > > > quota. > > > >> > >> > > > > Are you going to have separate throttle-time and > > queue-size > > > >> for > > > >> > >> > > requests > > > >> > >> > > > > throttled by io_thread_unit-based quota, or will they > > share > > > >> the > > > >> > >> same > > > >> > >> > > > > sensor? > > > >> > >> > > > > > > > >> > >> > > > > - Does the throttle-time in the ProduceResponse and > > > >> > FetchResponse > > > >> > >> > > > contains > > > >> > >> > > > > time due to io_thread_unit-based quota? > > > >> > >> > > > > > > > >> > >> > > > > - Currently kafka server doesn't not provide any log or > > > >> metrics > > > >> > >> that > > > >> > >> > > > tells > > > >> > >> > > > > whether any given clientId (or user) is throttled. This > > is > > > >> not > > > >> > too > > > >> > >> > bad > > > >> > >> > > > > because we can still check the client-side byte-rate > > metric > > > >> to > > > >> > >> > validate > > > >> > >> > > > > whether a given client is throttled. But with this > > > >> > io_thread_unit, > > > >> > >> > > there > > > >> > >> > > > > will be no way to validate whether a given client is slow > > > >> > because > > > >> > >> it > > > >> > >> > > has > > > >> > >> > > > > exceeded its io_thread_unit limit. It is necessary for > > user > > > >> to > > > >> > be > > > >> > >> > able > > > >> > >> > > to > > > >> > >> > > > > know this information to figure how whether they have > > > reached > > > >> > >> there > > > >> > >> > > quota > > > >> > >> > > > > limit. How about we add log4j log on the server side to > > > >> > >> periodically > > > >> > >> > > > print > > > >> > >> > > > > the (client_id, byte-rate-throttle-time, > > > >> > >> > io-thread-unit-throttle-time) > > > >> > >> > > so > > > >> > >> > > > > that kafka administrator can figure those users that have > > > >> > reached > > > >> > >> > their > > > >> > >> > > > > limit and act accordingly? > > > >> > >> > > > > > > > >> > >> > > > > Thanks, > > > >> > >> > > > > Dong > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > On Wed, Feb 22, 2017 at 4:46 PM, Guozhang Wang < > > > >> > >> wangg...@gmail.com> > > > >> > >> > > > wrote: > > > >> > >> > > > > > > > >> > >> > > > > > Made a pass over the doc, overall LGTM except a minor > > > >> comment > > > >> > on > > > >> > >> > the > > > >> > >> > > > > > throttling implementation: > > > >> > >> > > > > > > > > >> > >> > > > > > Stated as "Request processing time throttling will be > > > >> applied > > > >> > on > > > >> > >> > top > > > >> > >> > > if > > > >> > >> > > > > > necessary." I thought that it meant the request > > > processing > > > >> > time > > > >> > >> > > > > throttling > > > >> > >> > > > > > is applied first, but continue reading I found it > > > actually > > > >> > >> meant to > > > >> > >> > > > apply > > > >> > >> > > > > > produce / fetch byte rate throttling first. > > > >> > >> > > > > > > > > >> > >> > > > > > Also the last sentence "The remaining delay if any is > > > >> applied > > > >> > to > > > >> > >> > the > > > >> > >> > > > > > response." is a bit confusing to me. Maybe rewording > > it a > > > >> bit? > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > Guozhang > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > On Wed, Feb 22, 2017 at 3:24 PM, Jun Rao < > > > j...@confluent.io > > > >> > > > > >> > >> wrote: > > > >> > >> > > > > > > > > >> > >> > > > > > > Hi, Rajini, > > > >> > >> > > > > > > > > > >> > >> > > > > > > Thanks for the updated KIP. The latest proposal looks > > > >> good > > > >> > to > > > >> > >> me. > > > >> > >> > > > > > > > > > >> > >> > > > > > > Jun > > > >> > >> > > > > > > > > > >> > >> > > > > > > On Wed, Feb 22, 2017 at 2:19 PM, Rajini Sivaram < > > > >> > >> > > > > rajinisiva...@gmail.com > > > >> > >> > > > > > > > > > >> > >> > > > > > > wrote: > > > >> > >> > > > > > > > > > >> > >> > > > > > > > Jun/Roger, > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > Thank you for the feedback. > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > 1. I have updated the KIP to use absolute units > > > >> instead of > > > >> > >> > > > > percentage. > > > >> > >> > > > > > > The > > > >> > >> > > > > > > > property is called* io_thread_units* to align with > > > the > > > >> > >> thread > > > >> > >> > > count > > > >> > >> > > > > > > > property *num.io.threads*. When we implement > > network > > > >> > thread > > > >> > >> > > > > utilization > > > >> > >> > > > > > > > quotas, we can add another property > > > >> > *network_thread_units.* > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > 2. ControlledShutdown is already listed under the > > > >> exempt > > > >> > >> > > requests. > > > >> > >> > > > > Jun, > > > >> > >> > > > > > > did > > > >> > >> > > > > > > > you mean a different request that needs to be > > added? > > > >> The > > > >> > >> four > > > >> > >> > > > > requests > > > >> > >> > > > > > > > currently exempt in the KIP are StopReplica, > > > >> > >> > ControlledShutdown, > > > >> > >> > > > > > > > LeaderAndIsr and UpdateMetadata. These are > > controlled > > > >> > using > > > >> > >> > > > > > ClusterAction > > > >> > >> > > > > > > > ACL, so it is easy to exclude and only throttle if > > > >> > >> > unauthorized. > > > >> > >> > > I > > > >> > >> > > > > > wasn't > > > >> > >> > > > > > > > sure if there are other requests used only for > > > >> > inter-broker > > > >> > >> > that > > > >> > >> > > > > needed > > > >> > >> > > > > > > to > > > >> > >> > > > > > > > be excluded. > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > 3. I was thinking the smallest change would be to > > > >> replace > > > >> > >> all > > > >> > >> > > > > > references > > > >> > >> > > > > > > to > > > >> > >> > > > > > > > *requestChannel.sendResponse()* with a local > > method > > > >> > >> > > > > > > > *sendResponseMaybeThrottle()* that does the > > > throttling > > > >> if > > > >> > >> any > > > >> > >> > > plus > > > >> > >> > > > > send > > > >> > >> > > > > > > > response. If we throttle first in > > > *KafkaApis.handle()*, > > > >> > the > > > >> > >> > time > > > >> > >> > > > > spent > > > >> > >> > > > > > > > within the method handling the request will not be > > > >> > recorded > > > >> > >> or > > > >> > >> > > used > > > >> > >> > > > > in > > > >> > >> > > > > > > > throttling. We can look into this again when the PR > > > is > > > >> > ready > > > >> > >> > for > > > >> > >> > > > > > review. > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > Regards, > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > Rajini > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > On Wed, Feb 22, 2017 at 5:55 PM, Roger Hoover < > > > >> > >> > > > > roger.hoo...@gmail.com> > > > >> > >> > > > > > > > wrote: > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > Great to see this KIP and the excellent > > discussion. > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > To me, Jun's suggestion makes sense. If my > > > >> application > > > >> > is > > > >> > >> > > > > allocated > > > >> > >> > > > > > 1 > > > >> > >> > > > > > > > > request handler unit, then it's as if I have a > > > Kafka > > > >> > >> broker > > > >> > >> > > with > > > >> > >> > > > a > > > >> > >> > > > > > > single > > > >> > >> > > > > > > > > request handler thread dedicated to me. That's > > the > > > >> > most I > > > >> > >> > can > > > >> > >> > > > use, > > > >> > >> > > > > > at > > > >> > >> > > > > > > > > least. That allocation doesn't change even if an > > > >> admin > > > >> > >> later > > > >> > >> > > > > > increases > > > >> > >> > > > > > > > the > > > >> > >> > > > > > > > > size of the request thread pool on the broker. > > > It's > > > >> > >> similar > > > >> > >> > to > > > >> > >> > > > the > > > >> > >> > > > > > CPU > > > >> > >> > > > > > > > > abstraction that VMs and containers get from > > > >> hypervisors > > > >> > >> or > > > >> > >> > OS > > > >> > >> > > > > > > > schedulers. > > > >> > >> > > > > > > > > While different client access patterns can use > > > wildly > > > >> > >> > different > > > >> > >> > > > > > amounts > > > >> > >> > > > > > > > of > > > >> > >> > > > > > > > > request thread resources per request, a given > > > >> > application > > > >> > >> > will > > > >> > >> > > > > > > generally > > > >> > >> > > > > > > > > have a stable access pattern and can figure out > > > >> > >> empirically > > > >> > >> > how > > > >> > >> > > > > many > > > >> > >> > > > > > > > > "request thread units" it needs to meet it's > > > >> > >> > throughput/latency > > > >> > >> > > > > > goals. > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > Cheers, > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > Roger > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > On Wed, Feb 22, 2017 at 8:53 AM, Jun Rao < > > > >> > >> j...@confluent.io> > > > >> > >> > > > wrote: > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > Hi, Rajini, > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > Thanks for the updated KIP. A few more > > comments. > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > 1. A concern of request_time_percent is that > > it's > > > >> not > > > >> > an > > > >> > >> > > > absolute > > > >> > >> > > > > > > > value. > > > >> > >> > > > > > > > > > Let's say you give a user a 10% limit. If the > > > admin > > > >> > >> doubles > > > >> > >> > > the > > > >> > >> > > > > > > number > > > >> > >> > > > > > > > of > > > >> > >> > > > > > > > > > request handler threads, that user now actually > > > has > > > >> > >> twice > > > >> > >> > the > > > >> > >> > > > > > > absolute > > > >> > >> > > > > > > > > > capacity. This may confuse people a bit. So, > > > >> perhaps > > > >> > >> > setting > > > >> > >> > > > the > > > >> > >> > > > > > > quota > > > >> > >> > > > > > > > > > based on an absolute request thread unit is > > > better. > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > 2. ControlledShutdownRequest is also an > > > >> inter-broker > > > >> > >> > request > > > >> > >> > > > and > > > >> > >> > > > > > > needs > > > >> > >> > > > > > > > to > > > >> > >> > > > > > > > > > be excluded from throttling. > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > 3. Implementation wise, I am wondering if it's > > > >> simpler > > > >> > >> to > > > >> > >> > > apply > > > >> > >> > > > > the > > > >> > >> > > > > > > > > request > > > >> > >> > > > > > > > > > time throttling first in KafkaApis.handle(). > > > >> > Otherwise, > > > >> > >> we > > > >> > >> > > will > > > >> > >> > > > > > need > > > >> > >> > > > > > > to > > > >> > >> > > > > > > > > add > > > >> > >> > > > > > > > > > the throttling logic in each type of request. > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > Thanks, > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > Jun > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > On Wed, Feb 22, 2017 at 5:58 AM, Rajini > > Sivaram < > > > >> > >> > > > > > > > rajinisiva...@gmail.com > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > > Jun, > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > Thank you for the review. > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > I have reverted to the original KIP that > > > >> throttles > > > >> > >> based > > > >> > >> > on > > > >> > >> > > > > > request > > > >> > >> > > > > > > > > > handler > > > >> > >> > > > > > > > > > > utilization. At the moment, it uses > > percentage, > > > >> but > > > >> > I > > > >> > >> am > > > >> > >> > > > happy > > > >> > >> > > > > to > > > >> > >> > > > > > > > > change > > > >> > >> > > > > > > > > > to > > > >> > >> > > > > > > > > > > a fraction (out of 1 instead of 100) if > > > >> required. I > > > >> > >> have > > > >> > >> > > > added > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > examples > > > >> > >> > > > > > > > > > > from this discussion to the KIP. Also added a > > > >> > "Future > > > >> > >> > Work" > > > >> > >> > > > > > section > > > >> > >> > > > > > > > to > > > >> > >> > > > > > > > > > > address network thread utilization. The > > > >> > configuration > > > >> > >> is > > > >> > >> > > > named > > > >> > >> > > > > > > > > > > "request_time_percent" with the expectation > > > that > > > >> it > > > >> > >> can > > > >> > >> > > also > > > >> > >> > > > be > > > >> > >> > > > > > > used > > > >> > >> > > > > > > > as > > > >> > >> > > > > > > > > > the > > > >> > >> > > > > > > > > > > limit for network thread utilization when > > that > > > is > > > >> > >> > > > implemented, > > > >> > >> > > > > so > > > >> > >> > > > > > > > that > > > >> > >> > > > > > > > > > > users have to set only one config for the two > > > and > > > >> > not > > > >> > >> > have > > > >> > >> > > to > > > >> > >> > > > > > worry > > > >> > >> > > > > > > > > about > > > >> > >> > > > > > > > > > > the internal distribution of the work between > > > the > > > >> > two > > > >> > >> > > thread > > > >> > >> > > > > > pools > > > >> > >> > > > > > > in > > > >> > >> > > > > > > > > > > Kafka. > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > Regards, > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > Rajini > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > On Wed, Feb 22, 2017 at 12:23 AM, Jun Rao < > > > >> > >> > > j...@confluent.io> > > > >> > >> > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Hi, Rajini, > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Thanks for the proposal. > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > The benefit of using the request processing > > > >> time > > > >> > >> over > > > >> > >> > the > > > >> > >> > > > > > request > > > >> > >> > > > > > > > > rate > > > >> > >> > > > > > > > > > is > > > >> > >> > > > > > > > > > > > exactly what people have said. I will just > > > >> expand > > > >> > >> that > > > >> > >> > a > > > >> > >> > > > bit. > > > >> > >> > > > > > > > > Consider > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > following case. The producer sends a > > produce > > > >> > request > > > >> > >> > > with a > > > >> > >> > > > > > 10MB > > > >> > >> > > > > > > > > > message > > > >> > >> > > > > > > > > > > > but compressed to 100KB with gzip. The > > > >> > >> decompression of > > > >> > >> > > the > > > >> > >> > > > > > > message > > > >> > >> > > > > > > > > on > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > broker could take 10-15 seconds, during > > which > > > >> > time, > > > >> > >> a > > > >> > >> > > > request > > > >> > >> > > > > > > > handler > > > >> > >> > > > > > > > > > > > thread is completely blocked. In this case, > > > >> > neither > > > >> > >> the > > > >> > >> > > > > byte-in > > > >> > >> > > > > > > > quota > > > >> > >> > > > > > > > > > nor > > > >> > >> > > > > > > > > > > > the request rate quota may be effective in > > > >> > >> protecting > > > >> > >> > the > > > >> > >> > > > > > broker. > > > >> > >> > > > > > > > > > > Consider > > > >> > >> > > > > > > > > > > > another case. A consumer group starts with > > 10 > > > >> > >> instances > > > >> > >> > > and > > > >> > >> > > > > > later > > > >> > >> > > > > > > > on > > > >> > >> > > > > > > > > > > > switches to 20 instances. The request rate > > > will > > > >> > >> likely > > > >> > >> > > > > double, > > > >> > >> > > > > > > but > > > >> > >> > > > > > > > > the > > > >> > >> > > > > > > > > > > > actually load on the broker may not double > > > >> since > > > >> > >> each > > > >> > >> > > fetch > > > >> > >> > > > > > > request > > > >> > >> > > > > > > > > > only > > > >> > >> > > > > > > > > > > > contains half of the partitions. Request > > rate > > > >> > quota > > > >> > >> may > > > >> > >> > > not > > > >> > >> > > > > be > > > >> > >> > > > > > > easy > > > >> > >> > > > > > > > > to > > > >> > >> > > > > > > > > > > > configure in this case. > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > What we really want is to be able to > > prevent > > > a > > > >> > >> client > > > >> > >> > > from > > > >> > >> > > > > > using > > > >> > >> > > > > > > > too > > > >> > >> > > > > > > > > > much > > > >> > >> > > > > > > > > > > > of the server side resources. In this > > > >> particular > > > >> > >> KIP, > > > >> > >> > > this > > > >> > >> > > > > > > resource > > > >> > >> > > > > > > > > is > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > capacity of the request handler threads. I > > > >> agree > > > >> > >> that > > > >> > >> > it > > > >> > >> > > > may > > > >> > >> > > > > > not > > > >> > >> > > > > > > be > > > >> > >> > > > > > > > > > > > intuitive for the users to determine how to > > > set > > > >> > the > > > >> > >> > right > > > >> > >> > > > > > limit. > > > >> > >> > > > > > > > > > However, > > > >> > >> > > > > > > > > > > > this is not completely new and has been > > done > > > in > > > >> > the > > > >> > >> > > > container > > > >> > >> > > > > > > world > > > >> > >> > > > > > > > > > > > already. For example, Linux cgroup ( > > > >> > >> > > > > https://access.redhat.com/ > > > >> > >> > > > > > > > > > > > documentation/en-US/Red_Hat_En > > > >> > >> terprise_Linux/6/html/ > > > >> > >> > > > > > > > > > > > Resource_Management_Guide/sec-cpu.html) > > has > > > >> the > > > >> > >> > concept > > > >> > >> > > of > > > >> > >> > > > > > > > > > > > cpu.cfs_quota_us, > > > >> > >> > > > > > > > > > > > which specifies the total amount of time in > > > >> > >> > microseconds > > > >> > >> > > > for > > > >> > >> > > > > > > which > > > >> > >> > > > > > > > > all > > > >> > >> > > > > > > > > > > > tasks in a cgroup can run during a one > > second > > > >> > >> period. > > > >> > >> > We > > > >> > >> > > > can > > > >> > >> > > > > > > > > > potentially > > > >> > >> > > > > > > > > > > > model the request handler threads in a > > > similar > > > >> > way. > > > >> > >> For > > > >> > >> > > > > > example, > > > >> > >> > > > > > > > each > > > >> > >> > > > > > > > > > > > request handler thread can be 1 request > > > handler > > > >> > unit > > > >> > >> > and > > > >> > >> > > > the > > > >> > >> > > > > > > admin > > > >> > >> > > > > > > > > can > > > >> > >> > > > > > > > > > > > configure a limit on how many units (say > > > 0.01) > > > >> a > > > >> > >> client > > > >> > >> > > can > > > >> > >> > > > > > have. > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Regarding not throttling the internal > > broker > > > to > > > >> > >> broker > > > >> > >> > > > > > requests. > > > >> > >> > > > > > > We > > > >> > >> > > > > > > > > > could > > > >> > >> > > > > > > > > > > > do that. Alternatively, we could just let > > the > > > >> > admin > > > >> > >> > > > > configure a > > > >> > >> > > > > > > > high > > > >> > >> > > > > > > > > > > limit > > > >> > >> > > > > > > > > > > > for the kafka user (it may not be able to > > do > > > >> that > > > >> > >> > easily > > > >> > >> > > > > based > > > >> > >> > > > > > on > > > >> > >> > > > > > > > > > > clientId > > > >> > >> > > > > > > > > > > > though). > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Ideally we want to be able to protect the > > > >> > >> utilization > > > >> > >> > of > > > >> > >> > > > the > > > >> > >> > > > > > > > network > > > >> > >> > > > > > > > > > > thread > > > >> > >> > > > > > > > > > > > pool too. The difficult is mostly what > > Rajini > > > >> > said: > > > >> > >> (1) > > > >> > >> > > The > > > >> > >> > > > > > > > mechanism > > > >> > >> > > > > > > > > > for > > > >> > >> > > > > > > > > > > > throttling the requests is through > > Purgatory > > > >> and > > > >> > we > > > >> > >> > will > > > >> > >> > > > have > > > >> > >> > > > > > to > > > >> > >> > > > > > > > > think > > > >> > >> > > > > > > > > > > > through how to integrate that into the > > > network > > > >> > >> layer. > > > >> > >> > > (2) > > > >> > >> > > > In > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > network > > > >> > >> > > > > > > > > > > > layer, currently we know the user, but not > > > the > > > >> > >> clientId > > > >> > >> > > of > > > >> > >> > > > > the > > > >> > >> > > > > > > > > request. > > > >> > >> > > > > > > > > > > So, > > > >> > >> > > > > > > > > > > > it's a bit tricky to throttle based on > > > clientId > > > >> > >> there. > > > >> > >> > > > Plus, > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > byteOut > > > >> > >> > > > > > > > > > > > quota can already protect the network > > thread > > > >> > >> > utilization > > > >> > >> > > > for > > > >> > >> > > > > > > fetch > > > >> > >> > > > > > > > > > > > requests. So, if we can't figure out this > > > part > > > >> > right > > > >> > >> > now, > > > >> > >> > > > > just > > > >> > >> > > > > > > > > focusing > > > >> > >> > > > > > > > > > > on > > > >> > >> > > > > > > > > > > > the request handling threads for this KIP > > is > > > >> > still a > > > >> > >> > > useful > > > >> > >> > > > > > > > feature. > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Thanks, > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > Jun > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > On Tue, Feb 21, 2017 at 4:27 AM, Rajini > > > >> Sivaram < > > > >> > >> > > > > > > > > > rajinisiva...@gmail.com > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > Thank you all for the feedback. > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > Jay: I have removed exemption for > > consumer > > > >> > >> heartbeat > > > >> > >> > > etc. > > > >> > >> > > > > > Agree > > > >> > >> > > > > > > > > that > > > >> > >> > > > > > > > > > > > > protecting the cluster is more important > > > than > > > >> > >> > > protecting > > > >> > >> > > > > > > > individual > > > >> > >> > > > > > > > > > > apps. > > > >> > >> > > > > > > > > > > > > Have retained the exemption for > > > >> > >> > > StopReplicat/LeaderAndIsr > > > >> > >> > > > > > etc, > > > >> > >> > > > > > > > > these > > > >> > >> > > > > > > > > > > are > > > >> > >> > > > > > > > > > > > > throttled only if authorization fails (so > > > >> can't > > > >> > be > > > >> > >> > used > > > >> > >> > > > for > > > >> > >> > > > > > DoS > > > >> > >> > > > > > > > > > attacks > > > >> > >> > > > > > > > > > > > in > > > >> > >> > > > > > > > > > > > > a secure cluster, but allows inter-broker > > > >> > >> requests to > > > >> > >> > > > > > complete > > > >> > >> > > > > > > > > > without > > > >> > >> > > > > > > > > > > > > delays). > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > I will wait another day to see if these > > is > > > >> any > > > >> > >> > > objection > > > >> > >> > > > to > > > >> > >> > > > > > > > quotas > > > >> > >> > > > > > > > > > > based > > > >> > >> > > > > > > > > > > > on > > > >> > >> > > > > > > > > > > > > request processing time (as opposed to > > > >> request > > > >> > >> rate) > > > >> > >> > > and > > > >> > >> > > > if > > > >> > >> > > > > > > there > > > >> > >> > > > > > > > > are > > > >> > >> > > > > > > > > > > no > > > >> > >> > > > > > > > > > > > > objections, I will revert to the original > > > >> > proposal > > > >> > >> > with > > > >> > >> > > > > some > > > >> > >> > > > > > > > > changes. > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > The original proposal was only including > > > the > > > >> > time > > > >> > >> > used > > > >> > >> > > by > > > >> > >> > > > > the > > > >> > >> > > > > > > > > request > > > >> > >> > > > > > > > > > > > > handler threads (that made calculation > > > >> easy). I > > > >> > >> think > > > >> > >> > > the > > > >> > >> > > > > > > > > suggestion > > > >> > >> > > > > > > > > > is > > > >> > >> > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > include the time spent in the network > > > >> threads as > > > >> > >> well > > > >> > >> > > > since > > > >> > >> > > > > > > that > > > >> > >> > > > > > > > > may > > > >> > >> > > > > > > > > > be > > > >> > >> > > > > > > > > > > > > significant. As Jay pointed out, it is > > more > > > >> > >> > complicated > > > >> > >> > > > to > > > >> > >> > > > > > > > > calculate > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > total available CPU time and convert to a > > > >> ratio > > > >> > >> when > > > >> > >> > > > there > > > >> > >> > > > > > *m* > > > >> > >> > > > > > > > I/O > > > >> > >> > > > > > > > > > > > threads > > > >> > >> > > > > > > > > > > > > and *n* network threads. > > > >> > >> > ThreadMXBean#getThreadCPUTime( > > > >> > >> > > ) > > > >> > >> > > > > may > > > >> > >> > > > > > > > give > > > >> > >> > > > > > > > > us > > > >> > >> > > > > > > > > > > > what > > > >> > >> > > > > > > > > > > > > we want, but it can be very expensive on > > > some > > > >> > >> > > platforms. > > > >> > >> > > > As > > > >> > >> > > > > > > > Becket > > > >> > >> > > > > > > > > > and > > > >> > >> > > > > > > > > > > > > Guozhang have pointed out, we do have > > > several > > > >> > time > > > >> > >> > > > > > measurements > > > >> > >> > > > > > > > > > already > > > >> > >> > > > > > > > > > > > for > > > >> > >> > > > > > > > > > > > > generating metrics that we could use, > > > though > > > >> we > > > >> > >> might > > > >> > >> > > > want > > > >> > >> > > > > to > > > >> > >> > > > > > > > > switch > > > >> > >> > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > nanoTime() instead of currentTimeMillis() > > > >> since > > > >> > >> some > > > >> > >> > of > > > >> > >> > > > the > > > >> > >> > > > > > > > values > > > >> > >> > > > > > > > > > for > > > >> > >> > > > > > > > > > > > > small requests may be < 1ms. But rather > > > than > > > >> add > > > >> > >> up > > > >> > >> > the > > > >> > >> > > > > time > > > >> > >> > > > > > > > spent > > > >> > >> > > > > > > > > in > > > >> > >> > > > > > > > > > > I/O > > > >> > >> > > > > > > > > > > > > thread and network thread, wouldn't it be > > > >> better > > > >> > >> to > > > >> > >> > > > convert > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > time > > > >> > >> > > > > > > > > > > > spent > > > >> > >> > > > > > > > > > > > > on each thread into a separate ratio? > > UserA > > > >> has > > > >> > a > > > >> > >> > > request > > > >> > >> > > > > > quota > > > >> > >> > > > > > > > of > > > >> > >> > > > > > > > > > 5%. > > > >> > >> > > > > > > > > > > > Can > > > >> > >> > > > > > > > > > > > > we take that to mean that UserA can use > > 5% > > > of > > > >> > the > > > >> > >> > time > > > >> > >> > > on > > > >> > >> > > > > > > network > > > >> > >> > > > > > > > > > > threads > > > >> > >> > > > > > > > > > > > > and 5% of the time on I/O threads? If > > > either > > > >> is > > > >> > >> > > exceeded, > > > >> > >> > > > > the > > > >> > >> > > > > > > > > > response > > > >> > >> > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > throttled - it would mean maintaining two > > > >> sets > > > >> > of > > > >> > >> > > metrics > > > >> > >> > > > > for > > > >> > >> > > > > > > the > > > >> > >> > > > > > > > > two > > > >> > >> > > > > > > > > > > > > durations, but would result in more > > > >> meaningful > > > >> > >> > ratios. > > > >> > >> > > We > > > >> > >> > > > > > could > > > >> > >> > > > > > > > > > define > > > >> > >> > > > > > > > > > > > two > > > >> > >> > > > > > > > > > > > > quota limits (UserA has 5% of request > > > threads > > > >> > and > > > >> > >> 10% > > > >> > >> > > of > > > >> > >> > > > > > > network > > > >> > >> > > > > > > > > > > > threads), > > > >> > >> > > > > > > > > > > > > but that seems unnecessary and harder to > > > >> explain > > > >> > >> to > > > >> > >> > > > users. > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > Back to why and how quotas are applied to > > > >> > network > > > >> > >> > > thread > > > >> > >> > > > > > > > > utilization: > > > >> > >> > > > > > > > > > > > > a) In the case of fetch, the time spent > > in > > > >> the > > > >> > >> > network > > > >> > >> > > > > > thread > > > >> > >> > > > > > > > may > > > >> > >> > > > > > > > > be > > > >> > >> > > > > > > > > > > > > significant and I can see the need to > > > include > > > >> > >> this. > > > >> > >> > Are > > > >> > >> > > > > there > > > >> > >> > > > > > > > other > > > >> > >> > > > > > > > > > > > > requests where the network thread > > > >> utilization is > > > >> > >> > > > > significant? > > > >> > >> > > > > > > In > > > >> > >> > > > > > > > > the > > > >> > >> > > > > > > > > > > case > > > >> > >> > > > > > > > > > > > > of fetch, request handler thread > > > utilization > > > >> > would > > > >> > >> > > > throttle > > > >> > >> > > > > > > > clients > > > >> > >> > > > > > > > > > > with > > > >> > >> > > > > > > > > > > > > high request rate, low data volume and > > > fetch > > > >> > byte > > > >> > >> > rate > > > >> > >> > > > > quota > > > >> > >> > > > > > > will > > > >> > >> > > > > > > > > > > > throttle > > > >> > >> > > > > > > > > > > > > clients with high data volume. Network > > > thread > > > >> > >> > > utilization > > > >> > >> > > > > is > > > >> > >> > > > > > > > > perhaps > > > >> > >> > > > > > > > > > > > > proportional to the data volume. I am > > > >> wondering > > > >> > >> if we > > > >> > >> > > > even > > > >> > >> > > > > > need > > > >> > >> > > > > > > > to > > > >> > >> > > > > > > > > > > > throttle > > > >> > >> > > > > > > > > > > > > based on network thread utilization or > > > >> whether > > > >> > the > > > >> > >> > data > > > >> > >> > > > > > volume > > > >> > >> > > > > > > > > quota > > > >> > >> > > > > > > > > > > > covers > > > >> > >> > > > > > > > > > > > > this case. > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > b) At the moment, we record and check for > > > >> quota > > > >> > >> > > violation > > > >> > >> > > > > at > > > >> > >> > > > > > > the > > > >> > >> > > > > > > > > same > > > >> > >> > > > > > > > > > > > time. > > > >> > >> > > > > > > > > > > > > If a quota is violated, the response is > > > >> delayed. > > > >> > >> > Using > > > >> > >> > > > > Jay'e > > > >> > >> > > > > > > > > example > > > >> > >> > > > > > > > > > of > > > >> > >> > > > > > > > > > > > > disk reads for fetches happening in the > > > >> network > > > >> > >> > thread, > > > >> > >> > > > We > > > >> > >> > > > > > > can't > > > >> > >> > > > > > > > > > record > > > >> > >> > > > > > > > > > > > and > > > >> > >> > > > > > > > > > > > > delay a response after the disk reads. We > > > >> could > > > >> > >> > record > > > >> > >> > > > the > > > >> > >> > > > > > time > > > >> > >> > > > > > > > > spent > > > >> > >> > > > > > > > > > > on > > > >> > >> > > > > > > > > > > > > the network thread when the response is > > > >> complete > > > >> > >> and > > > >> > >> > > > > > introduce > > > >> > >> > > > > > > a > > > >> > >> > > > > > > > > > delay > > > >> > >> > > > > > > > > > > > for > > > >> > >> > > > > > > > > > > > > handling a subsequent request (separate > > out > > > >> > >> recording > > > >> > >> > > and > > > >> > >> > > > > > quota > > > >> > >> > > > > > > > > > > violation > > > >> > >> > > > > > > > > > > > > handling in the case of network thread > > > >> > overload). > > > >> > >> > Does > > > >> > >> > > > that > > > >> > >> > > > > > > make > > > >> > >> > > > > > > > > > sense? > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > Regards, > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > Rajini > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > On Tue, Feb 21, 2017 at 2:58 AM, Becket > > > Qin < > > > >> > >> > > > > > > > becket....@gmail.com> > > > >> > >> > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > Hey Jay, > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > Yeah, I agree that enforcing the CPU > > time > > > >> is a > > > >> > >> > little > > > >> > >> > > > > > > tricky. I > > > >> > >> > > > > > > > > am > > > >> > >> > > > > > > > > > > > > thinking > > > >> > >> > > > > > > > > > > > > > that maybe we can use the existing > > > request > > > >> > >> > > statistics. > > > >> > >> > > > > They > > > >> > >> > > > > > > are > > > >> > >> > > > > > > > > > > already > > > >> > >> > > > > > > > > > > > > > very detailed so we can probably see > > the > > > >> > >> > approximate > > > >> > >> > > > CPU > > > >> > >> > > > > > time > > > >> > >> > > > > > > > > from > > > >> > >> > > > > > > > > > > it, > > > >> > >> > > > > > > > > > > > > e.g. > > > >> > >> > > > > > > > > > > > > > something like (total_time - > > > >> > >> > > > request/response_queue_time > > > >> > >> > > > > - > > > >> > >> > > > > > > > > > > > remote_time). > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > I agree with Guozhang that when a user > > is > > > >> > >> throttled > > > >> > >> > > it > > > >> > >> > > > is > > > >> > >> > > > > > > > likely > > > >> > >> > > > > > > > > > that > > > >> > >> > > > > > > > > > > > we > > > >> > >> > > > > > > > > > > > > > need to see if anything has went wrong > > > >> first, > > > >> > >> and > > > >> > >> > if > > > >> > >> > > > the > > > >> > >> > > > > > > users > > > >> > >> > > > > > > > > are > > > >> > >> > > > > > > > > > > well > > > >> > >> > > > > > > > > > > > > > behaving and just need more resources, > > we > > > >> will > > > >> > >> have > > > >> > >> > > to > > > >> > >> > > > > bump > > > >> > >> > > > > > > up > > > >> > >> > > > > > > > > the > > > >> > >> > > > > > > > > > > > quota > > > >> > >> > > > > > > > > > > > > > for them. It is true that > > pre-allocating > > > >> CPU > > > >> > >> time > > > >> > >> > > quota > > > >> > >> > > > > > > > precisely > > > >> > >> > > > > > > > > > for > > > >> > >> > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > users is difficult. So in practice it > > > would > > > >> > >> > probably > > > >> > >> > > be > > > >> > >> > > > > > more > > > >> > >> > > > > > > > like > > > >> > >> > > > > > > > > > > first > > > >> > >> > > > > > > > > > > > > set > > > >> > >> > > > > > > > > > > > > > a relative high protective CPU time > > quota > > > >> for > > > >> > >> > > everyone > > > >> > >> > > > > and > > > >> > >> > > > > > > > > increase > > > >> > >> > > > > > > > > > > > that > > > >> > >> > > > > > > > > > > > > > for some individual clients on demand. > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > Thanks, > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 5:48 PM, > > Guozhang > > > >> > Wang < > > > >> > >> > > > > > > > > wangg...@gmail.com > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > This is a great proposal, glad to see > > > it > > > >> > >> > happening. > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > I am inclined to the CPU throttling, > > or > > > >> more > > > >> > >> > > > > specifically > > > >> > >> > > > > > > > > > > processing > > > >> > >> > > > > > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > > ratio instead of the request rate > > > >> throttling > > > >> > >> as > > > >> > >> > > well. > > > >> > >> > > > > > > Becket > > > >> > >> > > > > > > > > has > > > >> > >> > > > > > > > > > > very > > > >> > >> > > > > > > > > > > > > > well > > > >> > >> > > > > > > > > > > > > > > summed my rationales above, and one > > > >> thing to > > > >> > >> add > > > >> > >> > > here > > > >> > >> > > > > is > > > >> > >> > > > > > > that > > > >> > >> > > > > > > > > the > > > >> > >> > > > > > > > > > > > > former > > > >> > >> > > > > > > > > > > > > > > has a good support for both > > "protecting > > > >> > >> against > > > >> > >> > > rogue > > > >> > >> > > > > > > > clients" > > > >> > >> > > > > > > > > as > > > >> > >> > > > > > > > > > > > well > > > >> > >> > > > > > > > > > > > > as > > > >> > >> > > > > > > > > > > > > > > "utilizing a cluster for > > multi-tenancy > > > >> > usage": > > > >> > >> > when > > > >> > >> > > > > > > thinking > > > >> > >> > > > > > > > > > about > > > >> > >> > > > > > > > > > > > how > > > >> > >> > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > explain this to the end users, I find > > > it > > > >> > >> actually > > > >> > >> > > > more > > > >> > >> > > > > > > > natural > > > >> > >> > > > > > > > > > than > > > >> > >> > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > request rate since as mentioned > > above, > > > >> > >> different > > > >> > >> > > > > requests > > > >> > >> > > > > > > > will > > > >> > >> > > > > > > > > > have > > > >> > >> > > > > > > > > > > > > quite > > > >> > >> > > > > > > > > > > > > > > different "cost", and Kafka today > > > already > > > >> > have > > > >> > >> > > > various > > > >> > >> > > > > > > > request > > > >> > >> > > > > > > > > > > types > > > >> > >> > > > > > > > > > > > > > > (produce, fetch, admin, metadata, > > etc), > > > >> > >> because > > > >> > >> > of > > > >> > >> > > > that > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > rate > > > >> > >> > > > > > > > > > > > > > > throttling may not be as effective > > > >> unless it > > > >> > >> is > > > >> > >> > set > > > >> > >> > > > > very > > > >> > >> > > > > > > > > > > > > conservatively. > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > Regarding to user reactions when they > > > are > > > >> > >> > > throttled, > > > >> > >> > > > I > > > >> > >> > > > > > > think > > > >> > >> > > > > > > > it > > > >> > >> > > > > > > > > > may > > > >> > >> > > > > > > > > > > > > > differ > > > >> > >> > > > > > > > > > > > > > > case-by-case, and need to be > > > discovered / > > > >> > >> guided > > > >> > >> > by > > > >> > >> > > > > > looking > > > >> > >> > > > > > > > at > > > >> > >> > > > > > > > > > > > relative > > > >> > >> > > > > > > > > > > > > > > metrics. So in other words users > > would > > > >> not > > > >> > >> expect > > > >> > >> > > to > > > >> > >> > > > > get > > > >> > >> > > > > > > > > > additional > > > >> > >> > > > > > > > > > > > > > > information by simply being told > > "hey, > > > >> you > > > >> > are > > > >> > >> > > > > > throttled", > > > >> > >> > > > > > > > > which > > > >> > >> > > > > > > > > > is > > > >> > >> > > > > > > > > > > > all > > > >> > >> > > > > > > > > > > > > > > what throttling does; they need to > > > take a > > > >> > >> > follow-up > > > >> > >> > > > > step > > > >> > >> > > > > > > and > > > >> > >> > > > > > > > > see > > > >> > >> > > > > > > > > > > > "hmm, > > > >> > >> > > > > > > > > > > > > > I'm > > > >> > >> > > > > > > > > > > > > > > throttled probably because of ..", > > > which > > > >> is > > > >> > by > > > >> > >> > > > looking > > > >> > >> > > > > at > > > >> > >> > > > > > > > other > > > >> > >> > > > > > > > > > > > metric > > > >> > >> > > > > > > > > > > > > > > values: e.g. whether I'm bombarding > > the > > > >> > >> brokers > > > >> > >> > > with > > > >> > >> > > > > > > metadata > > > >> > >> > > > > > > > > > > > request, > > > >> > >> > > > > > > > > > > > > > > which are usually cheap to handle but > > > I'm > > > >> > >> sending > > > >> > >> > > > > > thousands > > > >> > >> > > > > > > > per > > > >> > >> > > > > > > > > > > > second; > > > >> > >> > > > > > > > > > > > > > or > > > >> > >> > > > > > > > > > > > > > > is it because I'm catching up and > > hence > > > >> > >> sending > > > >> > >> > > very > > > >> > >> > > > > > heavy > > > >> > >> > > > > > > > > > fetching > > > >> > >> > > > > > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > > > with large min.bytes, etc. > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > Regarding to the implementation, as > > > once > > > >> > >> > discussed > > > >> > >> > > > with > > > >> > >> > > > > > > Jun, > > > >> > >> > > > > > > > > this > > > >> > >> > > > > > > > > > > > seems > > > >> > >> > > > > > > > > > > > > > not > > > >> > >> > > > > > > > > > > > > > > very difficult since today we are > > > already > > > >> > >> > > collecting > > > >> > >> > > > > the > > > >> > >> > > > > > > > > "thread > > > >> > >> > > > > > > > > > > pool > > > >> > >> > > > > > > > > > > > > > > utilization" metrics, which is a > > single > > > >> > >> > percentage > > > >> > >> > > > > > > > > > > > "aggregateIdleMeter" > > > >> > >> > > > > > > > > > > > > > > value; but we are already effectively > > > >> > >> aggregating > > > >> > >> > > it > > > >> > >> > > > > for > > > >> > >> > > > > > > each > > > >> > >> > > > > > > > > > > > requests > > > >> > >> > > > > > > > > > > > > in > > > >> > >> > > > > > > > > > > > > > > KafkaRequestHandler, and we can just > > > >> extend > > > >> > >> it by > > > >> > >> > > > > > recording > > > >> > >> > > > > > > > the > > > >> > >> > > > > > > > > > > > source > > > >> > >> > > > > > > > > > > > > > > client id when handling them and > > > >> aggregating > > > >> > >> by > > > >> > >> > > > > clientId > > > >> > >> > > > > > as > > > >> > >> > > > > > > > > well > > > >> > >> > > > > > > > > > as > > > >> > >> > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > total aggregate. > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > Guozhang > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 4:27 PM, Jay > > > >> Kreps < > > > >> > >> > > > > > > j...@confluent.io > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > Hey Becket/Rajini, > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > When I thought about it more > > deeply I > > > >> came > > > >> > >> > around > > > >> > >> > > > to > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > "percent > > > >> > >> > > > > > > > > > > > of > > > >> > >> > > > > > > > > > > > > > > > processing time" metric too. It > > > seems a > > > >> > lot > > > >> > >> > > closer > > > >> > >> > > > to > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > thing > > > >> > >> > > > > > > > > > > we > > > >> > >> > > > > > > > > > > > > > > actually > > > >> > >> > > > > > > > > > > > > > > > care about and need to protect. I > > > also > > > >> > think > > > >> > >> > this > > > >> > >> > > > > would > > > >> > >> > > > > > > be > > > >> > >> > > > > > > > a > > > >> > >> > > > > > > > > > very > > > >> > >> > > > > > > > > > > > > > useful > > > >> > >> > > > > > > > > > > > > > > > metric even in the absence of > > > >> throttling > > > >> > >> just > > > >> > >> > to > > > >> > >> > > > > debug > > > >> > >> > > > > > > > whose > > > >> > >> > > > > > > > > > > using > > > >> > >> > > > > > > > > > > > > > > > capacity. > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > Two problems to consider: > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > 1. I agree that for the user it > > is > > > >> > >> > > > understandable > > > >> > >> > > > > > what > > > >> > >> > > > > > > > > lead > > > >> > >> > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > their > > > >> > >> > > > > > > > > > > > > > > > being throttled, but it is a bit > > > >> hard > > > >> > to > > > >> > >> > > figure > > > >> > >> > > > > out > > > >> > >> > > > > > > the > > > >> > >> > > > > > > > > safe > > > >> > >> > > > > > > > > > > > range > > > >> > >> > > > > > > > > > > > > > for > > > >> > >> > > > > > > > > > > > > > > > them. i.e. if I have a new app > > > that > > > >> > will > > > >> > >> > send > > > >> > >> > > > 200 > > > >> > >> > > > > > > > > > > messages/sec I > > > >> > >> > > > > > > > > > > > > can > > > >> > >> > > > > > > > > > > > > > > > probably reason that I'll be > > under > > > >> the > > > >> > >> > > > throttling > > > >> > >> > > > > > > limit > > > >> > >> > > > > > > > of > > > >> > >> > > > > > > > > > 300 > > > >> > >> > > > > > > > > > > > > > > req/sec. > > > >> > >> > > > > > > > > > > > > > > > However if I need to be under a > > > 10% > > > >> CPU > > > >> > >> > > > resources > > > >> > >> > > > > > > limit > > > >> > >> > > > > > > > it > > > >> > >> > > > > > > > > > may > > > >> > >> > > > > > > > > > > > be > > > >> > >> > > > > > > > > > > > > a > > > >> > >> > > > > > > > > > > > > > > bit > > > >> > >> > > > > > > > > > > > > > > > harder for me to know a priori > > if > > > i > > > >> > will > > > >> > >> or > > > >> > >> > > > won't. > > > >> > >> > > > > > > > > > > > > > > > 2. Calculating the available CPU > > > >> time > > > >> > is > > > >> > >> a > > > >> > >> > bit > > > >> > >> > > > > > > difficult > > > >> > >> > > > > > > > > > since > > > >> > >> > > > > > > > > > > > > there > > > >> > >> > > > > > > > > > > > > > > are > > > >> > >> > > > > > > > > > > > > > > > actually two thread pools--the > > I/O > > > >> > >> threads > > > >> > >> > and > > > >> > >> > > > the > > > >> > >> > > > > > > > network > > > >> > >> > > > > > > > > > > > > threads. > > > >> > >> > > > > > > > > > > > > > I > > > >> > >> > > > > > > > > > > > > > > > think > > > >> > >> > > > > > > > > > > > > > > > it might be workable to count > > just > > > >> the > > > >> > >> I/O > > > >> > >> > > > thread > > > >> > >> > > > > > time > > > >> > >> > > > > > > > as > > > >> > >> > > > > > > > > in > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > > proposal, > > > >> > >> > > > > > > > > > > > > > > > but the network thread work is > > > >> actually > > > >> > >> > > > > non-trivial > > > >> > >> > > > > > > > (e.g. > > > >> > >> > > > > > > > > > all > > > >> > >> > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > disk > > > >> > >> > > > > > > > > > > > > > > > reads for fetches happen in that > > > >> > >> thread). If > > > >> > >> > > you > > > >> > >> > > > > > count > > > >> > >> > > > > > > > > both > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > network > > > >> > >> > > > > > > > > > > > > > > > and > > > >> > >> > > > > > > > > > > > > > > > I/O threads it can skew things a > > > >> bit. > > > >> > >> E.g. > > > >> > >> > say > > > >> > >> > > > you > > > >> > >> > > > > > > have > > > >> > >> > > > > > > > 50 > > > >> > >> > > > > > > > > > > > network > > > >> > >> > > > > > > > > > > > > > > > threads, > > > >> > >> > > > > > > > > > > > > > > > 10 I/O threads, and 8 cores, > > what > > > is > > > >> > the > > > >> > >> > > > available > > > >> > >> > > > > > cpu > > > >> > >> > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > available > > > >> > >> > > > > > > > > > > > > > > > in a > > > >> > >> > > > > > > > > > > > > > > > second? I suppose this is a > > > problem > > > >> > >> whenever > > > >> > >> > > you > > > >> > >> > > > > > have > > > >> > >> > > > > > > a > > > >> > >> > > > > > > > > > > > bottleneck > > > >> > >> > > > > > > > > > > > > > > > between > > > >> > >> > > > > > > > > > > > > > > > I/O and network threads or if > > you > > > >> end > > > >> > up > > > >> > >> > > > > > significantly > > > >> > >> > > > > > > > > > > > > > > over-provisioning > > > >> > >> > > > > > > > > > > > > > > > one pool (both of which are hard > > > to > > > >> > >> avoid). > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > An alternative for CPU throttling > > > >> would be > > > >> > >> to > > > >> > >> > use > > > >> > >> > > > > this > > > >> > >> > > > > > > api: > > > >> > >> > > > > > > > > > > > > > > > http://docs.oracle.com/javase/ > > > >> > >> > > > > > 1.5.0/docs/api/java/lang/ > > > >> > >> > > > > > > > > > > > > > > > management/ThreadMXBean.html# > > > >> > >> > > > getThreadCpuTime(long) > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > That would let you track actual CPU > > > >> usage > > > >> > >> > across > > > >> > >> > > > the > > > >> > >> > > > > > > > network, > > > >> > >> > > > > > > > > > I/O > > > >> > >> > > > > > > > > > > > > > > threads, > > > >> > >> > > > > > > > > > > > > > > > and purgatory threads and look at > > it > > > >> as a > > > >> > >> > > > percentage > > > >> > >> > > > > of > > > >> > >> > > > > > > > total > > > >> > >> > > > > > > > > > > > cores. > > > >> > >> > > > > > > > > > > > > I > > > >> > >> > > > > > > > > > > > > > > > think this fixes many problems in > > the > > > >> > >> > reliability > > > >> > >> > > > of > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > > metric. > > > >> > >> > > > > > > > > > > > It's > > > >> > >> > > > > > > > > > > > > > > > meaning is slightly different as it > > > is > > > >> > just > > > >> > >> CPU > > > >> > >> > > > (you > > > >> > >> > > > > > > don't > > > >> > >> > > > > > > > > get > > > >> > >> > > > > > > > > > > > > charged > > > >> > >> > > > > > > > > > > > > > > for > > > >> > >> > > > > > > > > > > > > > > > time blocking on I/O) but that may > > be > > > >> okay > > > >> > >> > > because > > > >> > >> > > > we > > > >> > >> > > > > > > > already > > > >> > >> > > > > > > > > > > have > > > >> > >> > > > > > > > > > > > a > > > >> > >> > > > > > > > > > > > > > > > throttle on I/O. The downside is I > > > >> think > > > >> > it > > > >> > >> is > > > >> > >> > > > > possible > > > >> > >> > > > > > > > this > > > >> > >> > > > > > > > > > api > > > >> > >> > > > > > > > > > > > can > > > >> > >> > > > > > > > > > > > > be > > > >> > >> > > > > > > > > > > > > > > > disabled or isn't always available > > > and > > > >> it > > > >> > >> may > > > >> > >> > > also > > > >> > >> > > > be > > > >> > >> > > > > > > > > expensive > > > >> > >> > > > > > > > > > > > (also > > > >> > >> > > > > > > > > > > > > > > I've > > > >> > >> > > > > > > > > > > > > > > > never used it so not sure if it > > > really > > > >> > works > > > >> > >> > the > > > >> > >> > > > way > > > >> > >> > > > > i > > > >> > >> > > > > > > > > think). > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > -Jay > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:17 PM, > > > Becket > > > >> > Qin > > > >> > >> < > > > >> > >> > > > > > > > > > > becket....@gmail.com> > > > >> > >> > > > > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > If the purpose of the KIP is only > > > to > > > >> > >> protect > > > >> > >> > > the > > > >> > >> > > > > > > cluster > > > >> > >> > > > > > > > > from > > > >> > >> > > > > > > > > > > > being > > > >> > >> > > > > > > > > > > > > > > > > overwhelmed by crazy clients and > > is > > > >> not > > > >> > >> > > intended > > > >> > >> > > > to > > > >> > >> > > > > > > > address > > > >> > >> > > > > > > > > > > > > resource > > > >> > >> > > > > > > > > > > > > > > > > allocation problem among the > > > >> clients, I > > > >> > am > > > >> > >> > > > > wondering > > > >> > >> > > > > > if > > > >> > >> > > > > > > > > using > > > >> > >> > > > > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > > > > > handling time quota (CPU time > > > quota) > > > >> is > > > >> > a > > > >> > >> > > better > > > >> > >> > > > > > > option. > > > >> > >> > > > > > > > > Here > > > >> > >> > > > > > > > > > > are > > > >> > >> > > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > > > reasons: > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > 1. request handling time quota > > has > > > >> > better > > > >> > >> > > > > protection. > > > >> > >> > > > > > > Say > > > >> > >> > > > > > > > > we > > > >> > >> > > > > > > > > > > have > > > >> > >> > > > > > > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > > > > > rate quota and set that to some > > > value > > > >> > like > > > >> > >> > 100 > > > >> > >> > > > > > > > > requests/sec, > > > >> > >> > > > > > > > > > it > > > >> > >> > > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > > > > possible > > > >> > >> > > > > > > > > > > > > > > > > that some of the requests are > > very > > > >> > >> expensive > > > >> > >> > > > > actually > > > >> > >> > > > > > > > take > > > >> > >> > > > > > > > > a > > > >> > >> > > > > > > > > > > lot > > > >> > >> > > > > > > > > > > > of > > > >> > >> > > > > > > > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > > handle. In that case a few > > clients > > > >> may > > > >> > >> still > > > >> > >> > > > > occupy a > > > >> > >> > > > > > > lot > > > >> > >> > > > > > > > > of > > > >> > >> > > > > > > > > > > CPU > > > >> > >> > > > > > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > > > even > > > >> > >> > > > > > > > > > > > > > > > > the request rate is low. Arguably > > > we > > > >> can > > > >> > >> > > > carefully > > > >> > >> > > > > > set > > > >> > >> > > > > > > > > > request > > > >> > >> > > > > > > > > > > > rate > > > >> > >> > > > > > > > > > > > > > > quota > > > >> > >> > > > > > > > > > > > > > > > > for each request and client id > > > >> > >> combination, > > > >> > >> > but > > > >> > >> > > > it > > > >> > >> > > > > > > could > > > >> > >> > > > > > > > > > still > > > >> > >> > > > > > > > > > > be > > > >> > >> > > > > > > > > > > > > > > tricky > > > >> > >> > > > > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > > get it right for everyone. > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > If we use the request time > > handling > > > >> > >> quota, we > > > >> > >> > > can > > > >> > >> > > > > > > simply > > > >> > >> > > > > > > > > say > > > >> > >> > > > > > > > > > no > > > >> > >> > > > > > > > > > > > > > clients > > > >> > >> > > > > > > > > > > > > > > > can > > > >> > >> > > > > > > > > > > > > > > > > take up to more than 30% of the > > > total > > > >> > >> request > > > >> > >> > > > > > handling > > > >> > >> > > > > > > > > > capacity > > > >> > >> > > > > > > > > > > > > > > (measured > > > >> > >> > > > > > > > > > > > > > > > > by time), regardless of the > > > >> difference > > > >> > >> among > > > >> > >> > > > > > different > > > >> > >> > > > > > > > > > requests > > > >> > >> > > > > > > > > > > > or > > > >> > >> > > > > > > > > > > > > > what > > > >> > >> > > > > > > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > > > > > the client doing. In this case > > > maybe > > > >> we > > > >> > >> can > > > >> > >> > > quota > > > >> > >> > > > > all > > > >> > >> > > > > > > the > > > >> > >> > > > > > > > > > > > requests > > > >> > >> > > > > > > > > > > > > if > > > >> > >> > > > > > > > > > > > > > > we > > > >> > >> > > > > > > > > > > > > > > > > want to. > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > 2. The main benefit of using > > > request > > > >> > rate > > > >> > >> > limit > > > >> > >> > > > is > > > >> > >> > > > > > that > > > >> > >> > > > > > > > it > > > >> > >> > > > > > > > > > > seems > > > >> > >> > > > > > > > > > > > > more > > > >> > >> > > > > > > > > > > > > > > > > intuitive. It is true that it is > > > >> > probably > > > >> > >> > > easier > > > >> > >> > > > to > > > >> > >> > > > > > > > explain > > > >> > >> > > > > > > > > > to > > > >> > >> > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > user > > > >> > >> > > > > > > > > > > > > > > > > what does that mean. However, in > > > >> > practice > > > >> > >> it > > > >> > >> > > > looks > > > >> > >> > > > > > the > > > >> > >> > > > > > > > > impact > > > >> > >> > > > > > > > > > > of > > > >> > >> > > > > > > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > > > > > rate quota is not more > > quantifiable > > > >> than > > > >> > >> the > > > >> > >> > > > > request > > > >> > >> > > > > > > > > handling > > > >> > >> > > > > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > > quota. > > > >> > >> > > > > > > > > > > > > > > > > Unlike the byte rate quota, it is > > > >> still > > > >> > >> > > difficult > > > >> > >> > > > > to > > > >> > >> > > > > > > > give a > > > >> > >> > > > > > > > > > > > number > > > >> > >> > > > > > > > > > > > > > > about > > > >> > >> > > > > > > > > > > > > > > > > impact of throughput or latency > > > when > > > >> a > > > >> > >> > request > > > >> > >> > > > rate > > > >> > >> > > > > > > quota > > > >> > >> > > > > > > > > is > > > >> > >> > > > > > > > > > > hit. > > > >> > >> > > > > > > > > > > > > So > > > >> > >> > > > > > > > > > > > > > it > > > >> > >> > > > > > > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > > > > > not better than the request > > > handling > > > >> > time > > > >> > >> > > quota. > > > >> > >> > > > In > > > >> > >> > > > > > > fact > > > >> > >> > > > > > > > I > > > >> > >> > > > > > > > > > feel > > > >> > >> > > > > > > > > > > > it > > > >> > >> > > > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > > > > > clearer to tell user that "you > > are > > > >> > limited > > > >> > >> > > > because > > > >> > >> > > > > > you > > > >> > >> > > > > > > > have > > > >> > >> > > > > > > > > > > taken > > > >> > >> > > > > > > > > > > > > 30% > > > >> > >> > > > > > > > > > > > > > > of > > > >> > >> > > > > > > > > > > > > > > > > the CPU time on the broker" than > > > >> > otherwise > > > >> > >> > > > > something > > > >> > >> > > > > > > like > > > >> > >> > > > > > > > > > "your > > > >> > >> > > > > > > > > > > > > > request > > > >> > >> > > > > > > > > > > > > > > > > rate quota on metadata request > > has > > > >> > >> reached". > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > Thanks, > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > On Mon, Feb 20, 2017 at 2:23 PM, > > > Jay > > > >> > >> Kreps < > > > >> > >> > > > > > > > > j...@confluent.io > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > I think this proposal makes a > > lot > > > >> of > > > >> > >> sense > > > >> > >> > > > > > > (especially > > > >> > >> > > > > > > > > now > > > >> > >> > > > > > > > > > > that > > > >> > >> > > > > > > > > > > > > it > > > >> > >> > > > > > > > > > > > > > is > > > >> > >> > > > > > > > > > > > > > > > > > oriented around request rate) > > and > > > >> > fills > > > >> > >> the > > > >> > >> > > > > biggest > > > >> > >> > > > > > > > > > remaining > > > >> > >> > > > > > > > > > > > gap > > > >> > >> > > > > > > > > > > > > > in > > > >> > >> > > > > > > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > > > > multi-tenancy story. > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > I think for intra-cluster > > > >> > communication > > > >> > >> > > > > > (StopReplica, > > > >> > >> > > > > > > > > etc) > > > >> > >> > > > > > > > > > we > > > >> > >> > > > > > > > > > > > > could > > > >> > >> > > > > > > > > > > > > > > > avoid > > > >> > >> > > > > > > > > > > > > > > > > > throttling entirely. You can > > > >> secure or > > > >> > >> > > > otherwise > > > >> > >> > > > > > > > > lock-down > > > >> > >> > > > > > > > > > > the > > > >> > >> > > > > > > > > > > > > > > cluster > > > >> > >> > > > > > > > > > > > > > > > > > communication to avoid any > > > >> > unauthorized > > > >> > >> > > > external > > > >> > >> > > > > > > party > > > >> > >> > > > > > > > > from > > > >> > >> > > > > > > > > > > > > trying > > > >> > >> > > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > > > initiate these requests. As a > > > >> result > > > >> > we > > > >> > >> are > > > >> > >> > > as > > > >> > >> > > > > > likely > > > >> > >> > > > > > > > to > > > >> > >> > > > > > > > > > > cause > > > >> > >> > > > > > > > > > > > > > > problems > > > >> > >> > > > > > > > > > > > > > > > > as > > > >> > >> > > > > > > > > > > > > > > > > > solve them by throttling these, > > > >> right? > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > I'm not so sure that we should > > > >> exempt > > > >> > >> the > > > >> > >> > > > > consumer > > > >> > >> > > > > > > > > requests > > > >> > >> > > > > > > > > > > > such > > > >> > >> > > > > > > > > > > > > as > > > >> > >> > > > > > > > > > > > > > > > > > heartbeat. It's true that if we > > > >> > >> throttle an > > > >> > >> > > > app's > > > >> > >> > > > > > > > > heartbeat > > > >> > >> > > > > > > > > > > > > > requests > > > >> > >> > > > > > > > > > > > > > > it > > > >> > >> > > > > > > > > > > > > > > > > may > > > >> > >> > > > > > > > > > > > > > > > > > cause it to fall out of its > > > >> consumer > > > >> > >> group. > > > >> > >> > > > > However > > > >> > >> > > > > > > if > > > >> > >> > > > > > > > we > > > >> > >> > > > > > > > > > > don't > > > >> > >> > > > > > > > > > > > > > > > throttle > > > >> > >> > > > > > > > > > > > > > > > > it > > > >> > >> > > > > > > > > > > > > > > > > > it may DDOS the cluster if the > > > >> > heartbeat > > > >> > >> > > > interval > > > >> > >> > > > > > is > > > >> > >> > > > > > > > set > > > >> > >> > > > > > > > > > > > > > incorrectly > > > >> > >> > > > > > > > > > > > > > > or > > > >> > >> > > > > > > > > > > > > > > > > if > > > >> > >> > > > > > > > > > > > > > > > > > some client in some language > > has > > > a > > > >> > bug. > > > >> > >> I > > > >> > >> > > think > > > >> > >> > > > > the > > > >> > >> > > > > > > > > policy > > > >> > >> > > > > > > > > > > with > > > >> > >> > > > > > > > > > > > > > this > > > >> > >> > > > > > > > > > > > > > > > kind > > > >> > >> > > > > > > > > > > > > > > > > > of throttling is to protect the > > > >> > cluster > > > >> > >> > above > > > >> > >> > > > any > > > >> > >> > > > > > > > > > individual > > > >> > >> > > > > > > > > > > > app, > > > >> > >> > > > > > > > > > > > > > > > right? > > > >> > >> > > > > > > > > > > > > > > > > I > > > >> > >> > > > > > > > > > > > > > > > > > think in general this should be > > > >> okay > > > >> > >> since > > > >> > >> > > for > > > >> > >> > > > > most > > > >> > >> > > > > > > > > > > deployments > > > >> > >> > > > > > > > > > > > > > this > > > >> > >> > > > > > > > > > > > > > > > > > setting is meant as more of a > > > >> safety > > > >> > >> > > > valve---that > > > >> > >> > > > > > is > > > >> > >> > > > > > > > > rather > > > >> > >> > > > > > > > > > > > than > > > >> > >> > > > > > > > > > > > > > set > > > >> > >> > > > > > > > > > > > > > > > > > something very close to what > > you > > > >> > expect > > > >> > >> to > > > >> > >> > > need > > > >> > >> > > > > > (say > > > >> > >> > > > > > > 2 > > > >> > >> > > > > > > > > > > req/sec > > > >> > >> > > > > > > > > > > > or > > > >> > >> > > > > > > > > > > > > > > > > whatever) > > > >> > >> > > > > > > > > > > > > > > > > > you would have something quite > > > high > > > >> > >> (like > > > >> > >> > 100 > > > >> > >> > > > > > > req/sec) > > > >> > >> > > > > > > > > with > > > >> > >> > > > > > > > > > > > this > > > >> > >> > > > > > > > > > > > > > > meant > > > >> > >> > > > > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > > > prevent a client gone crazy. I > > > >> think > > > >> > >> when > > > >> > >> > > used > > > >> > >> > > > > this > > > >> > >> > > > > > > way > > > >> > >> > > > > > > > > > > > allowing > > > >> > >> > > > > > > > > > > > > > > those > > > >> > >> > > > > > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > > > be throttled would actually > > > provide > > > >> > >> > > meaningful > > > >> > >> > > > > > > > > protection. > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > -Jay > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > On Fri, Feb 17, 2017 at 9:05 > > AM, > > > >> > Rajini > > > >> > >> > > > Sivaram < > > > >> > >> > > > > > > > > > > > > > > > rajinisiva...@gmail.com > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > wrote: > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > Hi all, > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > I have just created KIP-124 > > to > > > >> > >> introduce > > > >> > >> > > > > request > > > >> > >> > > > > > > rate > > > >> > >> > > > > > > > > > > quotas > > > >> > >> > > > > > > > > > > > to > > > >> > >> > > > > > > > > > > > > > > > Kafka: > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/ > > > >> > >> > > > > > > > confluence/display/KAFKA/KIP- > > > >> > >> > > > > > > > > > > > > > > > > > > 124+-+Request+rate+quotas > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > The proposal is for a simple > > > >> > >> percentage > > > >> > >> > > > request > > > >> > >> > > > > > > > > handling > > > >> > >> > > > > > > > > > > time > > > >> > >> > > > > > > > > > > > > > quota > > > >> > >> > > > > > > > > > > > > > > > > that > > > >> > >> > > > > > > > > > > > > > > > > > > can be allocated to > > > >> *<client-id>*, > > > >> > >> > *<user>* > > > >> > >> > > > or > > > >> > >> > > > > > > > *<user, > > > >> > >> > > > > > > > > > > > > > client-id>*. > > > >> > >> > > > > > > > > > > > > > > > > There > > > >> > >> > > > > > > > > > > > > > > > > > > are a few other suggestions > > > also > > > >> > under > > > >> > >> > > > > "Rejected > > > >> > >> > > > > > > > > > > > alternatives". > > > >> > >> > > > > > > > > > > > > > > > > Feedback > > > >> > >> > > > > > > > > > > > > > > > > > > and suggestions are welcome. > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > Thank you... > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > Regards, > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > Rajini > > > >> > >> > > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > -- > > > >> > >> > > > > > > > > > > > > > > -- Guozhang > > > >> > >> > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> > > > > > > > > > > >> > >> > > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > > > > >> > >> > > > > > -- > > > >> > >> > > > > > -- Guozhang > > > >> > >> > > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > >