Re: [VOTE] KIP-219 - Improve Quota Communication

Rajini Sivaram Thu, 18 Jan 2018 01:33:07 -0800

Hi Becket,

Thanks for the update. Yes, it does address my concern.


+1 (binding)

Regards,

Rajini

On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <becket....@gmail.com> wrote:

> Actually returning an empty fetch request may still be useful to reduce the
> throttle time due to request quota violation because the FetchResponse send
> time will be less. I just updated the KIP.
>
> Rajini, does that address your concern?
>
> On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <becket....@gmail.com> wrote:
>
> > Thanks for the reply, Jun.
> >
> > Currently the byte rate quota does not apply to HeartbeatRequest,
> > JoinGroupRequest/SyncGroupRequest. So the only case those requests are
> > throttled is because the request quota is violated. In that case, the
> > throttle time does not really matter whether we return a full
> FetchResponse
> > or an empty one. Would it be more consistent if we throttle based on the
> > actual throttle time / channel mute time?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Jiangjie,
> >>
> >> You are right that the heartbeat is done in a channel different from the
> >> fetch request. I think it's still useful to return an empty fetch
> response
> >> when the quota is violated. This way, the throttle time for the
> heartbeat
> >> request won't be large. I agree that we can just mute the channel for
> the
> >> fetch request for the throttle time computed based on a full fetch
> >> response. This probably also partially addresses Rajini's #1 concern.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <becket....@gmail.com>
> wrote:
> >>
> >> > Hi Rajini,
> >> >
> >> > Thanks for the comments. Pleas see the reply inline.
> >> >
> >> > Hi Jun,
> >> >
> >> > Thinking about the consumer rebalance case a bit more, I am not sure
> if
> >> we
> >> > need to worry about the delayed rebalance due to quota violation or
> not.
> >> > The rebalance actually uses a separate channel to coordinator.
> Therefore
> >> > unless the request quota is hit, the rebalance won't be throttled.
> Even
> >> if
> >> > request quota is hit, it seems unlikely to be delayed long. So it
> looks
> >> > that we don't need to unmute the channel earlier than needed. Does
> this
> >> > address your concern?
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> >> rajinisiva...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Becket,
> >> > >
> >> > > A few questions:
> >> > >
> >> > > 1. KIP says: *Although older client implementations (prior to
> >> knowledge
> >> > of
> >> > > this KIP) will immediately send the next request after the broker
> >> > responds
> >> > > without paying attention to the throttle time field, the broker is
> >> > > protected by virtue of muting the channel for time X. i.e., the next
> >> > > request will not be processed until the channel is unmuted. *
> >> > > For fetch requests, the response is sent immediately and the mute
> >> time on
> >> > > the broker based on empty fetch response will often be zero (unlike
> >> the
> >> > > throttle time returned to the client based on non-empty response).
> >> Won't
> >> > > that lead to a tight loop of fetch requests from old clients
> >> > (particularly
> >> > > expensive with SSL)? Wouldn't it be better to retain current
> behaviour
> >> > for
> >> > > old clients? Also, if we change the behaviour for old clients,
> client
> >> > > metrics that track throttle time will report a lot of throttle
> >> unrelated
> >> > to
> >> > >  actual throttle time.
> >> > >
> >> > For consumers, if quota is violated, the throttle time on the broker
> >> will
> >> > not be 0. It is just that the throttle time will not be increasing
> >> because
> >> > the consumer will return an empty response in this case. So there
> should
> >> > not be a tight loop.
> >> >
> >> >
> >> > > 2. KIP says: *The usual idle timeout i.e., connections.max.idle.ms
> >> > > <http://connections.max.idle.ms> will still be honored during the
> >> > throttle
> >> > > time X. This makes sure that the brokers will detect client
> connection
> >> > > closure in a bounded time.*
> >> > > Wouldn't it be better to bound maximum throttle time to
> >> > > *connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>*? If we mute for a time greater
> than
> >> > > *connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>* and then close a client
> connection
> >> > > simply
> >> > > because we had muted it on the broker for a longer throttle time, we
> >> > force
> >> > > a reconnection and read the next request from the new connection
> >> straight
> >> > > away. This feels the same as a bound on the quota of *
> >> > > connections.max.idle.ms
> >> > > <http://connections.max.idle.ms>*, but with added load on the
> broker
> >> for
> >> > > authenticating another connection (expensive with SSL).
> >> > >
> >> > I think we need to think about the consumer prior to and after this
> KIP
> >> > separately.
> >> >
> >> > For consumer prior to this KIP, even if we unmute the channel after
> >> > connection.max.idle.ms, it is likely that the consumers have already
> >> > reached request.timeout.ms before that and has reconnected to the
> >> broker.
> >> > So there is no real difference whether we close the throttled channel
> or
> >> > not.
> >> >
> >> > For consumers after the KIP, because they will honor the throttle
> time,
> >> > they will back off until throttle time is reached. If that throttle
> >> time is
> >> > longer than connection.max.idle.ms, it seems not a big overhead
> because
> >> > there will only be one connection re-establishment in quite a few
> >> minutes.
> >> > Compared with such overhead, it seems more important to honor the
> quota
> >> so
> >> > the broker can survive.
> >> >
> >> >
> >> > > 3. Are we changing the behaviour of network bandwidth quota for
> >> > > Produce/Fetch and retaining current behaviour for request quotas?
> >> > >
> >> > This is going to be applied to all the quota. Including request
> quotas.
> >> >
> >> >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Rajini
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <j...@confluent.io> wrote:
> >> > >
> >> > > > Hi, Jiangjie,
> >> > > >
> >> > > > Thanks for the updated KIP. +1
> >> > > >
> >> > > > Jun
> >> > > >
> >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <becket....@gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Thanks for the comments, Jun.
> >> > > > >
> >> > > > > 1. Good point.
> >> > > > > 2. Also makes sense. Usually the connection.max.idle.ms is high
> >> > enough
> >> > > > so
> >> > > > > the throttling is impacted.
> >> > > > >
> >> > > > > I have updated the KIP to reflect the changes.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Jiangjie (Becket) Qin
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <j...@confluent.io>
> wrote:
> >> > > > >
> >> > > > > > Hi, Jiangjie,
> >> > > > > >
> >> > > > > > Sorry for the late response. The proposal sounds good
> overall. A
> >> > > couple
> >> > > > > of
> >> > > > > > minor comments below.
> >> > > > > >
> >> > > > > > 1. For throttling a fetch request, we could potentially just
> >> send
> >> > an
> >> > > > > empty
> >> > > > > > response. We can return a throttle time calculated from a full
> >> > > > response,
> >> > > > > > but only mute the channel on the server based on a throttle
> time
> >> > > > > calculated
> >> > > > > > based on the empty response. This has the benefit that the
> >> server
> >> > > will
> >> > > > > mute
> >> > > > > > the channel much shorter, which will prevent the consumer from
> >> > > > > rebalancing
> >> > > > > > when throttled.
> >> > > > > >
> >> > > > > > 2. The wiki says "connections.max.idle.ms should be ignored
> >> during
> >> > > the
> >> > > > > > throttle time X." This has the potential issue that a server
> may
> >> > not
> >> > > > > detect
> >> > > > > > that a client connection is already gone until after an
> >> arbitrary
> >> > > > amount
> >> > > > > of
> >> > > > > > time. Perhaps we could still just close a connection if the
> >> server
> >> > > has
> >> > > > > > muted it for longer than connections.max.idle.ms. This will
> at
> >> > least
> >> > > > > bound
> >> > > > > > the time for a server to detect closed client connections.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > >
> >> > > > > > Jun
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> >> becket....@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi,
> >> > > > > > >
> >> > > > > > > We would like to start the voting thread for KIP-219. The
> KIP
> >> > > > proposes
> >> > > > > to
> >> > > > > > > improve the quota communication between the brokers and
> >> clients,
> >> > > > > > especially
> >> > > > > > > for cases of long throttling time.
> >> > > > > > >
> >> > > > > > > The KIP wiki is following:
> >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > > > > 219+-+Improve+quota+
> >> > > > > > > communication
> >> > > > > > >
> >> > > > > > > The discussion thread is here:
> >> > > > > > > http://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> >> > > > > > > 20KIP-219+page:1+mid:ooxabguy7nz7l7zy+state:results
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Jiangjie (Becket) Qin
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to