Hi Becket, Thanks for the update. Yes, it does address my concern.
+1 (binding) Regards, Rajini On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <becket....@gmail.com> wrote: > Actually returning an empty fetch request may still be useful to reduce the > throttle time due to request quota violation because the FetchResponse send > time will be less. I just updated the KIP. > > Rajini, does that address your concern? > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <becket....@gmail.com> wrote: > > > Thanks for the reply, Jun. > > > > Currently the byte rate quota does not apply to HeartbeatRequest, > > JoinGroupRequest/SyncGroupRequest. So the only case those requests are > > throttled is because the request quota is violated. In that case, the > > throttle time does not really matter whether we return a full > FetchResponse > > or an empty one. Would it be more consistent if we throttle based on the > > actual throttle time / channel mute time? > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <j...@confluent.io> wrote: > > > >> Hi, Jiangjie, > >> > >> You are right that the heartbeat is done in a channel different from the > >> fetch request. I think it's still useful to return an empty fetch > response > >> when the quota is violated. This way, the throttle time for the > heartbeat > >> request won't be large. I agree that we can just mute the channel for > the > >> fetch request for the throttle time computed based on a full fetch > >> response. This probably also partially addresses Rajini's #1 concern. > >> > >> Thanks, > >> > >> Jun > >> > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <becket....@gmail.com> > wrote: > >> > >> > Hi Rajini, > >> > > >> > Thanks for the comments. Pleas see the reply inline. > >> > > >> > Hi Jun, > >> > > >> > Thinking about the consumer rebalance case a bit more, I am not sure > if > >> we > >> > need to worry about the delayed rebalance due to quota violation or > not. > >> > The rebalance actually uses a separate channel to coordinator. > Therefore > >> > unless the request quota is hit, the rebalance won't be throttled. > Even > >> if > >> > request quota is hit, it seems unlikely to be delayed long. So it > looks > >> > that we don't need to unmute the channel earlier than needed. Does > this > >> > address your concern? > >> > > >> > Thanks, > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram < > >> rajinisiva...@gmail.com> > >> > wrote: > >> > > >> > > Hi Becket, > >> > > > >> > > A few questions: > >> > > > >> > > 1. KIP says: *Although older client implementations (prior to > >> knowledge > >> > of > >> > > this KIP) will immediately send the next request after the broker > >> > responds > >> > > without paying attention to the throttle time field, the broker is > >> > > protected by virtue of muting the channel for time X. i.e., the next > >> > > request will not be processed until the channel is unmuted. * > >> > > For fetch requests, the response is sent immediately and the mute > >> time on > >> > > the broker based on empty fetch response will often be zero (unlike > >> the > >> > > throttle time returned to the client based on non-empty response). > >> Won't > >> > > that lead to a tight loop of fetch requests from old clients > >> > (particularly > >> > > expensive with SSL)? Wouldn't it be better to retain current > behaviour > >> > for > >> > > old clients? Also, if we change the behaviour for old clients, > client > >> > > metrics that track throttle time will report a lot of throttle > >> unrelated > >> > to > >> > > actual throttle time. > >> > > > >> > For consumers, if quota is violated, the throttle time on the broker > >> will > >> > not be 0. It is just that the throttle time will not be increasing > >> because > >> > the consumer will return an empty response in this case. So there > should > >> > not be a tight loop. > >> > > >> > > >> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms > >> > > <http://connections.max.idle.ms> will still be honored during the > >> > throttle > >> > > time X. This makes sure that the brokers will detect client > connection > >> > > closure in a bounded time.* > >> > > Wouldn't it be better to bound maximum throttle time to > >> > > *connections.max.idle.ms > >> > > <http://connections.max.idle.ms>*? If we mute for a time greater > than > >> > > *connections.max.idle.ms > >> > > <http://connections.max.idle.ms>* and then close a client > connection > >> > > simply > >> > > because we had muted it on the broker for a longer throttle time, we > >> > force > >> > > a reconnection and read the next request from the new connection > >> straight > >> > > away. This feels the same as a bound on the quota of * > >> > > connections.max.idle.ms > >> > > <http://connections.max.idle.ms>*, but with added load on the > broker > >> for > >> > > authenticating another connection (expensive with SSL). > >> > > > >> > I think we need to think about the consumer prior to and after this > KIP > >> > separately. > >> > > >> > For consumer prior to this KIP, even if we unmute the channel after > >> > connection.max.idle.ms, it is likely that the consumers have already > >> > reached request.timeout.ms before that and has reconnected to the > >> broker. > >> > So there is no real difference whether we close the throttled channel > or > >> > not. > >> > > >> > For consumers after the KIP, because they will honor the throttle > time, > >> > they will back off until throttle time is reached. If that throttle > >> time is > >> > longer than connection.max.idle.ms, it seems not a big overhead > because > >> > there will only be one connection re-establishment in quite a few > >> minutes. > >> > Compared with such overhead, it seems more important to honor the > quota > >> so > >> > the broker can survive. > >> > > >> > > >> > > 3. Are we changing the behaviour of network bandwidth quota for > >> > > Produce/Fetch and retaining current behaviour for request quotas? > >> > > > >> > This is going to be applied to all the quota. Including request > quotas. > >> > > >> > > >> > > > >> > > Thanks, > >> > > > >> > > Rajini > >> > > > >> > > > >> > > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <j...@confluent.io> wrote: > >> > > > >> > > > Hi, Jiangjie, > >> > > > > >> > > > Thanks for the updated KIP. +1 > >> > > > > >> > > > Jun > >> > > > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <becket....@gmail.com> > >> > wrote: > >> > > > > >> > > > > Thanks for the comments, Jun. > >> > > > > > >> > > > > 1. Good point. > >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high > >> > enough > >> > > > so > >> > > > > the throttling is impacted. > >> > > > > > >> > > > > I have updated the KIP to reflect the changes. > >> > > > > > >> > > > > Thanks, > >> > > > > > >> > > > > Jiangjie (Becket) Qin > >> > > > > > >> > > > > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <j...@confluent.io> > wrote: > >> > > > > > >> > > > > > Hi, Jiangjie, > >> > > > > > > >> > > > > > Sorry for the late response. The proposal sounds good > overall. A > >> > > couple > >> > > > > of > >> > > > > > minor comments below. > >> > > > > > > >> > > > > > 1. For throttling a fetch request, we could potentially just > >> send > >> > an > >> > > > > empty > >> > > > > > response. We can return a throttle time calculated from a full > >> > > > response, > >> > > > > > but only mute the channel on the server based on a throttle > time > >> > > > > calculated > >> > > > > > based on the empty response. This has the benefit that the > >> server > >> > > will > >> > > > > mute > >> > > > > > the channel much shorter, which will prevent the consumer from > >> > > > > rebalancing > >> > > > > > when throttled. > >> > > > > > > >> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored > >> during > >> > > the > >> > > > > > throttle time X." This has the potential issue that a server > may > >> > not > >> > > > > detect > >> > > > > > that a client connection is already gone until after an > >> arbitrary > >> > > > amount > >> > > > > of > >> > > > > > time. Perhaps we could still just close a connection if the > >> server > >> > > has > >> > > > > > muted it for longer than connections.max.idle.ms. This will > at > >> > least > >> > > > > bound > >> > > > > > the time for a server to detect closed client connections. > >> > > > > > > >> > > > > > Thanks, > >> > > > > > > >> > > > > > Jun > >> > > > > > > >> > > > > > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin < > >> becket....@gmail.com> > >> > > > > wrote: > >> > > > > > > >> > > > > > > Hi, > >> > > > > > > > >> > > > > > > We would like to start the voting thread for KIP-219. The > KIP > >> > > > proposes > >> > > > > to > >> > > > > > > improve the quota communication between the brokers and > >> clients, > >> > > > > > especially > >> > > > > > > for cases of long throttling time. > >> > > > > > > > >> > > > > > > The KIP wiki is following: > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >> > > > > > 219+-+Improve+quota+ > >> > > > > > > communication > >> > > > > > > > >> > > > > > > The discussion thread is here: > >> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka% > >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Jiangjie (Becket) Qin > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >