Joel,

Yes, for your second comment. The tricky thing is still to figure out which
replicas to throttle and by how much since in general, admins probably
don't want already in-sync or close to in-sync replicas to be throttled. It
would be great to get Todd's opinion on this. Could you ping him?

Yes, we'd be happy to discuss auto-detection of effect traffic more offline.

Thanks,

Jun

On Thu, Aug 18, 2016 at 10:21 AM, Joel Koshy <jjkosh...@gmail.com> wrote:

> > For your first comment. We thought about determining "effect" replicas
> > automatically as well. First, there are some tricky stuff that one has to
> >
>
> Auto-detection of effect traffic: i'm fairly certain it's doable but
> definitely tricky. I'm also not sure it is something worth tackling at the
> outset. If we want to spend more time thinking over it even if it's just an
> academic exercise I would be happy to brainstorm offline.
>
>
> > For your second comment, we discussed that in the client quotas design. A
> > down side of that for client quotas is that a client may be surprised
> that
> > its traffic is not throttled at one time, but throttled as another with
> the
> > same quota (basically, less predicability). You can imaging setting a
> quota
> > for all replication traffic and only slow down the "effect" replicas if
> > needed. The thought is more or less the same as the above. It requires
> more
> >
>
> For clients, this is true. I think this is much less of an issue for
> server-side replication since the "users" here are the Kafka SREs who
> generally know these internal details.
>
> I think it would be valuable to get some feedback from SREs on the proposal
> before proceeding to a vote. (ping Todd)
>
> Joel
>
>
> >
> > On Thu, Aug 18, 2016 at 9:37 AM, Ben Stopford <b...@confluent.io> wrote:
> >
> > > Hi Joel
> > >
> > > Ha! yes we had some similar thoughts, on both counts. Both are actually
> > > good approaches, but come with some extra complexity.
> > >
> > > Segregating the replication type is tempting as it creates a more
> general
> > > solution. One issue is you need to draw a line between lagging and not
> > > lagging. The ISR ‘limit' is a tempting divider, but has the side effect
> > > that, once you drop out you get immediately throttled. Adding a
> > > configurable divider is another option, but difficult for admins to
> set,
> > > and always a little arbitrary. A better idea is to prioritise, in
> reverse
> > > order to lag. But that also comes with additional complexity of its
> own.
> > >
> > > Under throttling is also a tempting addition. That’s to say, if there’s
> > > idle bandwidth lying around, not being used, why not use it to let
> > lagging
> > > brokers catch up. This involves some comparison to the maximum
> bandwidth,
> > > which could be configurable, or could be derived, with pros and cons
> for
> > > each.
> > >
> > > But the more general problem is actually quite hard to reason about, so
> > > after some discussion we decided to settle on something simple, that we
> > > felt we could get working, and extend to add these additional features
> as
> > > subsequent KIPs.
> > >
> > > I hope that seems reasonable. Jun may wish to add to this.
> > >
> > > B
> > >
> > >
> > > > On 18 Aug 2016, at 06:56, Joel Koshy <jjkosh...@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford <b...@confluent.io>
> > wrote:
> > > >
> > > >>
> > > >> Let's us know if you have any further thoughts on KIP-73, else we'll
> > > kick
> > > >> off a vote.
> > > >>
> > > >
> > > > I think the mechanism for throttling replicas looks good. Just had a
> > few
> > > > more thoughts on the configuration section. What you have looks
> > > reasonable,
> > > > but I was wondering if it could be made simpler. You probably thought
> > > > through these, so I'm curious to know your take.
> > > >
> > > > My guess is that most of the time, users would want to throttle all
> > > effect
> > > > replication - due to partition reassignments, adding brokers or a
> > broker
> > > > coming back online after an extended period of time. In all these
> > > scenarios
> > > > it may be possible to distinguish bootstrap (effect) vs normal
> > > replication
> > > > - based on how far the replica has to catch up. I'm wondering if it
> is
> > > > enough to just set an umbrella "effect" replication quota with
> perhaps
> > > > per-topic overrides (say if some topics are more important than
> others)
> > > as
> > > > opposed to designating throttled replicas.
> > > >
> > > > Also, IIRC during client-side quota discussions we had considered the
> > > > possibility of allowing clients to go above their quotas when
> resources
> > > are
> > > > available. We ended up not doing that, but for replication throttling
> > it
> > > > may make sense - i.e., to treat the quota as a soft limit. Another
> way
> > to
> > > > look at it is instead of ensuring "effect replication traffic does
> not
> > > flow
> > > > faster than X bytes/sec" it may be useful to instead ensure that
> > "effect
> > > > replication traffic only flows as slowly as necessary (so as not to
> > > > adversely affect normal replication traffic)."
> > > >
> > > > Thanks,
> > > >
> > > > Joel
> > > >
> > > >>>
> > > >>>> On Thu, Aug 11, 2016 at 2:43 PM, Jun Rao <j...@confluent.io
> > > >>> <javascript:;>> wrote:
> > > >>>>
> > > >>>>> Hi, Joel,
> > > >>>>>
> > > >>>>> Yes, the response size includes both throttled and unthrottled
> > > >>> replicas.
> > > >>>>> However, the response is only delayed up to max.wait if the
> > response
> > > >>> size
> > > >>>>> is less than min.bytes, which matches the current behavior. So,
> > there
> > > >>> is
> > > >>>> no
> > > >>>>> extra delay to due throttling, right? For replica fetchers, the
> > > >> default
> > > >>>>> min.byte is 1. So, the response is only delayed if there is no
> byte
> > > >> in
> > > >>>> the
> > > >>>>> response, which is what we want.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>>
> > > >>>>> Jun
> > > >>>>>
> > > >>>>> On Thu, Aug 11, 2016 at 11:53 AM, Joel Koshy <
> jjkosh...@gmail.com
> > > >>> <javascript:;>>
> > > >>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Jun,
> > > >>>>>>
> > > >>>>>> I'm not sure that would work unless we have separate replica
> > > >>> fetchers,
> > > >>>>>> since this would cause all replicas (including ones that are not
> > > >>>>> throttled)
> > > >>>>>> to get delayed. Instead, we could just have the leader populate
> > the
> > > >>>>>> throttle-time field of the response as a hint to the follower as
> > to
> > > >>> how
> > > >>>>>> long it should wait before it adds those replicas back to its
> > > >>>> subsequent
> > > >>>>>> replica fetch requests.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>>
> > > >>>>>> Joel
> > > >>>>>>
> > > >>>>>> On Thu, Aug 11, 2016 at 9:50 AM, Jun Rao <j...@confluent.io
> > > >>> <javascript:;>> wrote:
> > > >>>>>>
> > > >>>>>>> Mayuresh,
> > > >>>>>>>
> > > >>>>>>> That's a good question. I think if the response size (after
> > > >> leader
> > > >>>>>>> throttling) is smaller than min.bytes, we will just delay the
> > > >>> sending
> > > >>>>> of
> > > >>>>>>> the response up to max.wait as we do now. This should prevent
> > > >>>> frequent
> > > >>>>>>> empty responses to the follower.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>>
> > > >>>>>>> Jun
> > > >>>>>>>
> > > >>>>>>> On Wed, Aug 10, 2016 at 9:17 PM, Mayuresh Gharat <
> > > >>>>>>> gharatmayures...@gmail.com <javascript:;>
> > > >>>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> This might have been answered before.
> > > >>>>>>>> I was wondering when the leader quota is reached and it sends
> > > >>> empty
> > > >>>>>>>> response ( If the inclusion of a partition, listed in the
> > > >>> leader's
> > > >>>>>>>> throttled-replicas list, causes the LeaderQuotaRate to be
> > > >>> exceeded,
> > > >>>>>> that
> > > >>>>>>>> partition is omitted from the response (aka returns 0
> bytes).).
> > > >>> At
> > > >>>>> this
> > > >>>>>>>> point the follower quota is NOT reached and the follower is
> > > >> still
> > > >>>>> going
> > > >>>>>>> to
> > > >>>>>>>> ask for the that partition in the next fetch request. Would it
> > > >> be
> > > >>>>> fair
> > > >>>>>> to
> > > >>>>>>>> add some logic there so that the follower backs off ( for some
> > > >>>>>>> configurable
> > > >>>>>>>> time) from including those partitions in the next fetch
> > > >> request?
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>>
> > > >>>>>>>> Mayuresh
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Aug 10, 2016 at 8:06 AM, Ben Stopford <
> > > >> b...@confluent.io
> > > >>> <javascript:;>>
> > > >>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Thanks again for the responses everyone. I’ve removed the the
> > > >>>> extra
> > > >>>>>>>>> fetcher threads from the proposal, switching to the
> > > >>>> inclusion-based
> > > >>>>>>>>> approach. The relevant section is:
> > > >>>>>>>>>
> > > >>>>>>>>> The follower makes a requests, using the fixed size of
> > > >>>>>>>>> replica.fetch.response.max.bytes as per KIP-74 <
> > > >>>>>>>> https://cwiki.apache.org/
> > > >>>>>>>>> confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+
> > > >>>>>>>> Limit+in+Bytes>.
> > > >>>>>>>>> The order of the partitions in the fetch request are
> > > >> randomised
> > > >>>> to
> > > >>>>>>> ensure
> > > >>>>>>>>> fairness.
> > > >>>>>>>>> When the leader receives the fetch request it processes the
> > > >>>>>> partitions
> > > >>>>>>> in
> > > >>>>>>>>> the defined order, up to the response's size limit. If the
> > > >>>>> inclusion
> > > >>>>>>> of a
> > > >>>>>>>>> partition, listed in the leader's throttled-replicas list,
> > > >>> causes
> > > >>>>> the
> > > >>>>>>>>> LeaderQuotaRate to be exceeded, that partition is omitted
> > > >> from
> > > >>>> the
> > > >>>>>>>> response
> > > >>>>>>>>> (aka returns 0 bytes). Logically, this is of the form:
> > > >>>>>>>>> var bytesAllowedForThrottledPartition =
> > > >>>>> quota.recordAndMaybeAdjust(
> > > >>>>>>>>> bytesRequestedForPartition)
> > > >>>>>>>>> When the follower receives the fetch response, if it includes
> > > >>>>>>> partitions
> > > >>>>>>>>> in its throttled-partitions list, it increments the
> > > >>>>>> FollowerQuotaRate:
> > > >>>>>>>>> var includeThrottledPartitionsInNextRequest: Boolean =
> > > >>>>>>>>> quota.recordAndEvaluate(previousResponseThrottledBytes)
> > > >>>>>>>>> If the quota is exceeded, no throttled partitions will be
> > > >>>> included
> > > >>>>> in
> > > >>>>>>> the
> > > >>>>>>>>> next fetch request emitted by this replica fetcher thread.
> > > >>>>>>>>>
> > > >>>>>>>>> B
> > > >>>>>>>>>
> > > >>>>>>>>>> On 9 Aug 2016, at 23:34, Jun Rao <j...@confluent.io
> > > >>> <javascript:;>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> When there are several unthrottled replicas, we could also
> > > >>> just
> > > >>>>> do
> > > >>>>>>>> what's
> > > >>>>>>>>>> suggested in KIP-74. The client is responsible for
> > > >> reordering
> > > >>>> the
> > > >>>>>>>>>> partitions and the leader fills in the bytes to those
> > > >>>> partitions
> > > >>>>> in
> > > >>>>>>>>> order,
> > > >>>>>>>>>> up to the quota limit.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We could also do what you suggested. If quota is exceeded,
> > > >>>>> include
> > > >>>>>>>> empty
> > > >>>>>>>>>> data in the response for throttled replicas. Keep doing
> > > >> that
> > > >>>>> until
> > > >>>>>>>> enough
> > > >>>>>>>>>> time has passed so that the quota is no longer exceeded.
> > > >> This
> > > >>>>>>>> potentially
> > > >>>>>>>>>> allows better batching per partition. Not sure if the two
> > > >>>> makes a
> > > >>>>>> big
> > > >>>>>>>>>> difference in practice though.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Jun
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Tue, Aug 9, 2016 at 2:31 PM, Joel Koshy <
> > > >>>> jjkosh...@gmail.com <javascript:;>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On the leader side, one challenge is related to the
> > > >>> fairness
> > > >>>>>> issue
> > > >>>>>>>> that
> > > >>>>>>>>>>> Ben
> > > >>>>>>>>>>>> brought up. The question is what if the fetch response
> > > >>> limit
> > > >>>> is
> > > >>>>>>>> filled
> > > >>>>>>>>> up
> > > >>>>>>>>>>>> by the throttled replicas? If this happens constantly, we
> > > >>>> will
> > > >>>>>>> delay
> > > >>>>>>>>> the
> > > >>>>>>>>>>>> progress of those un-throttled replicas. However, I think
> > > >>> we
> > > >>>>> can
> > > >>>>>>>>> address
> > > >>>>>>>>>>>> this issue by trying to fill up the unthrottled replicas
> > > >> in
> > > >>>> the
> > > >>>>>>>>> response
> > > >>>>>>>>>>>> first. So, the algorithm would be. Fill up unthrottled
> > > >>>> replicas
> > > >>>>>> up
> > > >>>>>>> to
> > > >>>>>>>>> the
> > > >>>>>>>>>>>> fetch response limit. If there is space left, fill up
> > > >>>> throttled
> > > >>>>>>>>> replicas.
> > > >>>>>>>>>>>> If quota is exceeded for the throttled replicas, reduce
> > > >> the
> > > >>>>> bytes
> > > >>>>>>> in
> > > >>>>>>>>> the
> > > >>>>>>>>>>>> throttled replicas in the response accordingly.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Right - that's what I was trying to convey by truncation
> > > >> (vs
> > > >>>>>> empty).
> > > >>>>>>>> So
> > > >>>>>>>>> we
> > > >>>>>>>>>>> would attempt to fill the response for throttled
> > > >> partitions
> > > >>> as
> > > >>>>>> much
> > > >>>>>>> as
> > > >>>>>>>>> we
> > > >>>>>>>>>>> can before hitting the quota limit. There is one more
> > > >> detail
> > > >>>> to
> > > >>>>>>> handle
> > > >>>>>>>>> in
> > > >>>>>>>>>>> this: if there are several throttled partitions and not
> > > >>> enough
> > > >>>>>>>> remaining
> > > >>>>>>>>>>> allowance in the fetch response to include all the
> > > >> throttled
> > > >>>>>>> replicas
> > > >>>>>>>>> then
> > > >>>>>>>>>>> we would need to decide which of those partitions get a
> > > >>> share;
> > > >>>>>> which
> > > >>>>>>>> is
> > > >>>>>>>>> why
> > > >>>>>>>>>>> I'm wondering if it is easier to return empty for those
> > > >>>>> partitions
> > > >>>>>>>>> entirely
> > > >>>>>>>>>>> in the fetch response - they will make progress in the
> > > >>>>> subsequent
> > > >>>>>>>>> fetch. If
> > > >>>>>>>>>>> they don't make fast enough progress then that would be a
> > > >>> case
> > > >>>>> for
> > > >>>>>>>>> raising
> > > >>>>>>>>>>> the threshold or letting it complete at an off-peak time.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> With this approach, we need some new logic to handle
> > > >>>> throttling
> > > >>>>>> on
> > > >>>>>>>> the
> > > >>>>>>>>>>>> leader, but we can leave the replica threading model
> > > >>>> unchanged.
> > > >>>>>> So,
> > > >>>>>>>>>>>> overall, this still seems to be a simpler approach.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Jun
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat <
> > > >>>>>>>>>>>> gharatmayures...@gmail.com <javascript:;>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Nice write up Ben.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I agree with Joel for keeping this simple by excluding
> > > >> the
> > > >>>>>>>> partitions
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>> the fetch request/response when the quota is violated at
> > > >>> the
> > > >>>>>>>> follower
> > > >>>>>>>>>>> or
> > > >>>>>>>>>>>>> leader instead of having a separate set of threads for
> > > >>>>> handling
> > > >>>>>>> the
> > > >>>>>>>>>>> quota
> > > >>>>>>>>>>>>> and non quota cases. Even though its different from the
> > > >>>>> current
> > > >>>>>>>> quota
> > > >>>>>>>>>>>>> implementation it should be OK since its internal to
> > > >>> brokers
> > > >>>>> and
> > > >>>>>>> can
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>> handled by tuning the quota configs for it appropriately
> > > >>> by
> > > >>>>> the
> > > >>>>>>>>> admins.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Also can you elaborate with an example how this would be
> > > >>>>>> handled :
> > > >>>>>>>>>>>>> *guaranteeing
> > > >>>>>>>>>>>>> ordering of updates when replicas shift threads*
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Mayuresh
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy <
> > > >>>>>> jjkosh...@gmail.com <javascript:;>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On the need for both leader/follower throttling: that
> > > >>> makes
> > > >>>>>>> sense -
> > > >>>>>>>>>>>>> thanks
> > > >>>>>>>>>>>>>> for clarifying. For completeness, can we add this
> > > >> detail
> > > >>> to
> > > >>>>> the
> > > >>>>>>>> doc -
> > > >>>>>>>>>>>>> say,
> > > >>>>>>>>>>>>>> after the quote that I pasted earlier?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> From an implementation perspective though: I’m still
> > > >>>>> interested
> > > >>>>>>> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> simplicity of not having to add separate replica
> > > >>> fetchers,
> > > >>>>>> delay
> > > >>>>>>>>>>> queue
> > > >>>>>>>>>>>> on
> > > >>>>>>>>>>>>>> the leader, and “move” partitions from the throttled
> > > >>>> replica
> > > >>>>>>>> fetchers
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> the regular replica fetchers once caught up.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Instead, I think it would work and be simpler to
> > > >> include
> > > >>> or
> > > >>>>>>> exclude
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> partitions in the fetch request from the follower and
> > > >>> fetch
> > > >>>>>>>> response
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>>> the leader when the quota is violated. The issue of
> > > >>>> fairness
> > > >>>>>> that
> > > >>>>>>>> Ben
> > > >>>>>>>>>>>>> noted
> > > >>>>>>>>>>>>>> may be a wash between the two options (that Ben wrote
> > > >> in
> > > >>>> his
> > > >>>>>>>> email).
> > > >>>>>>>>>>>> With
> > > >>>>>>>>>>>>>> the default quota delay mechanism, partitions get
> > > >> delayed
> > > >>>>>>>> essentially
> > > >>>>>>>>>>>> at
> > > >>>>>>>>>>>>>> random - i.e., whoever fetches at the time of quota
> > > >>>> violation
> > > >>>>>>> gets
> > > >>>>>>>>>>>>> delayed
> > > >>>>>>>>>>>>>> at the leader. So we can adopt a similar policy in
> > > >>> choosing
> > > >>>>> to
> > > >>>>>>>>>>> truncate
> > > >>>>>>>>>>>>>> partitions in fetch responses. i.e., if at the time of
> > > >>>>> handling
> > > >>>>>>> the
> > > >>>>>>>>>>>> fetch
> > > >>>>>>>>>>>>>> the “effect” replication rate exceeds the quota then
> > > >>> either
> > > >>>>>> empty
> > > >>>>>>>> or
> > > >>>>>>>>>>>>>> truncate those partitions from the response. (BTW
> > > >> effect
> > > >>>>>>>> replication
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>>> your terminology in the wiki - i.e., replication due to
> > > >>>>>> partition
> > > >>>>>>>>>>>>>> reassignment, adding brokers, etc.)
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> While this may be slightly different from the existing
> > > >>>> quota
> > > >>>>>>>>>>> mechanism
> > > >>>>>>>>>>>> I
> > > >>>>>>>>>>>>>> think the difference is small (since we would reuse the
> > > >>>> quota
> > > >>>>>>>> manager
> > > >>>>>>>>>>>> at
> > > >>>>>>>>>>>>>> worst with some refactoring) and will be internal to
> > > >> the
> > > >>>>>> broker.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> So I guess the question is if this alternative is
> > > >> simpler
> > > >>>>>> enough
> > > >>>>>>>> and
> > > >>>>>>>>>>>>>> equally functional to not go with dedicated throttled
> > > >>>> replica
> > > >>>>>>>>>>> fetchers.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao <
> > > >>> j...@confluent.io <javascript:;>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Just to elaborate on what Ben said why we need
> > > >>> throttling
> > > >>>> on
> > > >>>>>>> both
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> leader and the follower side.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> If we only have throttling on the follower side,
> > > >>> consider
> > > >>>> a
> > > >>>>>> case
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> add 5 more new brokers and want to move some replicas
> > > >>> from
> > > >>>>>>>> existing
> > > >>>>>>>>>>>>>> brokers
> > > >>>>>>>>>>>>>>> over to those 5 brokers. Each of those broker is going
> > > >>> to
> > > >>>>>> fetch
> > > >>>>>>>>>>> data
> > > >>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>> all existing brokers. Then, it's possible that the
> > > >>>>> aggregated
> > > >>>>>>>> fetch
> > > >>>>>>>>>>>>> load
> > > >>>>>>>>>>>>>>> from those 5 brokers on a particular existing broker
> > > >>>> exceeds
> > > >>>>>> its
> > > >>>>>>>>>>>>> outgoing
> > > >>>>>>>>>>>>>>> network bandwidth, even though the inbounding traffic
> > > >> on
> > > >>>>> each
> > > >>>>>> of
> > > >>>>>>>>>>>> those
> > > >>>>>>>>>>>>> 5
> > > >>>>>>>>>>>>>>> brokers is bounded.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> If we only have throttling on the leader side,
> > > >> consider
> > > >>>> the
> > > >>>>>> same
> > > >>>>>>>>>>>>> example
> > > >>>>>>>>>>>>>>> above. It's possible for the incoming traffic to each
> > > >> of
> > > >>>>>> those 5
> > > >>>>>>>>>>>>> brokers
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> exceed its network bandwidth since it is fetching data
> > > >>>> from
> > > >>>>>> all
> > > >>>>>>>>>>>>> existing
> > > >>>>>>>>>>>>>>> brokers.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> So, being able to set a quota on both the follower and
> > > >>> the
> > > >>>>>>> leader
> > > >>>>>>>>>>>> side
> > > >>>>>>>>>>>>>>> protects both cases.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Jun
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford <
> > > >>>>>> b...@confluent.io <javascript:;>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Joel
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks for taking the time to look at this.
> > > >>> Appreciated.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Regarding throttling on both leader and follower,
> > > >> this
> > > >>>>>> proposal
> > > >>>>>>>>>>>>> covers
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> more general solution which can guarantee a quota,
> > > >> even
> > > >>>>> when
> > > >>>>>> a
> > > >>>>>>>>>>>>>> rebalance
> > > >>>>>>>>>>>>>>>> operation produces an asymmetric profile of load.
> > > >> This
> > > >>>>> means
> > > >>>>>>>>>>>>>>> administrators
> > > >>>>>>>>>>>>>>>> don’t need to calculate the impact that a
> > > >> follower-only
> > > >>>>> quota
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>> the leaders they are fetching from. So for example
> > > >>> where
> > > >>>>>>> replica
> > > >>>>>>>>>>>>> sizes
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>> skewed or where a partial rebalance is required.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Having said that, even with both leader and follower
> > > >>>>> quotas,
> > > >>>>>>> the
> > > >>>>>>>>>>>> use
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> additional threads is actually optional. There appear
> > > >>> to
> > > >>>> be
> > > >>>>>> two
> > > >>>>>>>>>>>>> general
> > > >>>>>>>>>>>>>>>> approaches (1) omit partitions from fetch requests
> > > >>>>>> (follower) /
> > > >>>>>>>>>>>> fetch
> > > >>>>>>>>>>>>>>>> responses (leader) when they exceed their quota (2)
> > > >>> delay
> > > >>>>>> them,
> > > >>>>>>>>>>> as
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> existing quota mechanism does, using separate
> > > >> fetchers.
> > > >>>>> Both
> > > >>>>>>>>>>> appear
> > > >>>>>>>>>>>>>>> valid,
> > > >>>>>>>>>>>>>>>> but with slightly different design tradeoffs.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> The issue with approach (1) is that it departs
> > > >> somewhat
> > > >>>>> from
> > > >>>>>>> the
> > > >>>>>>>>>>>>>> existing
> > > >>>>>>>>>>>>>>>> quotas implementation, and must include a notion of
> > > >>>>> fairness
> > > >>>>>>>>>>>> within,
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> now size-bounded, request and response. The issue
> > > >> with
> > > >>>> (2)
> > > >>>>> is
> > > >>>>>>>>>>>>>>> guaranteeing
> > > >>>>>>>>>>>>>>>> ordering of updates when replicas shift threads, but
> > > >>> this
> > > >>>>> is
> > > >>>>>>>>>>>> handled,
> > > >>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> the most part, in the code today.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I’ve updated the rejected alternatives section to
> > > >> make
> > > >>>>> this a
> > > >>>>>>>>>>>> little
> > > >>>>>>>>>>>>>>>> clearer.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> B
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On 8 Aug 2016, at 20:38, Joel Koshy <
> > > >>>> jjkosh...@gmail.com <javascript:;>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hi Ben,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks for the detailed write-up. So the proposal
> > > >>>> involves
> > > >>>>>>>>>>>>>>>> self-throttling
> > > >>>>>>>>>>>>>>>>> on the fetcher side and throttling at the leader.
> > > >> Can
> > > >>>> you
> > > >>>>>>>>>>>> elaborate
> > > >>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> reasoning that is given on the wiki: *“The throttle
> > > >> is
> > > >>>>>> applied
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>>> both
> > > >>>>>>>>>>>>>>>>> leaders and followers. This allows the admin to
> > > >> exert
> > > >>>>> strong
> > > >>>>>>>>>>>>>> guarantees
> > > >>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>> the throttle limit".* Is there any reason why one or
> > > >>> the
> > > >>>>>> other
> > > >>>>>>>>>>>>>> wouldn't
> > > >>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> sufficient.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Specifically, if we were to only do self-throttling
> > > >> on
> > > >>>> the
> > > >>>>>>>>>>>>> fetchers,
> > > >>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> could potentially avoid the additional replica
> > > >>> fetchers
> > > >>>>>> right?
> > > >>>>>>>>>>>>> i.e.,
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> replica fetchers would maintain its quota metrics as
> > > >>> you
> > > >>>>>>>>>>> proposed
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> each
> > > >>>>>>>>>>>>>>>>> (normal) replica fetch presents an opportunity to
> > > >> make
> > > >>>>>>> progress
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> throttled partitions as long as their effective
> > > >>>>> consumption
> > > >>>>>>>>>>> rate
> > > >>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>> below
> > > >>>>>>>>>>>>>>>>> the quota limit. If it exceeds the consumption rate
> > > >>> then
> > > >>>>>> don’t
> > > >>>>>>>>>>>>>> include
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> throttled partitions in the subsequent fetch
> > > >> requests
> > > >>>>> until
> > > >>>>>>> the
> > > >>>>>>>>>>>>>>> effective
> > > >>>>>>>>>>>>>>>>> consumption rate for those partitions returns to
> > > >>> within
> > > >>>>> the
> > > >>>>>>>>>>> quota
> > > >>>>>>>>>>>>>>>> threshold.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I have more questions on the proposal, but was more
> > > >>>>>> interested
> > > >>>>>>>>>>> in
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> above
> > > >>>>>>>>>>>>>>>>> to see if it could simplify things a bit.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Also, can you open up access to the google-doc that
> > > >>> you
> > > >>>>> link
> > > >>>>>>>>>>> to?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Joel
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <
> > > >>>>>>> b...@confluent.io <javascript:;>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> We’ve created KIP-73: Replication Quotas
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> The idea is to allow an admin to throttle moving
> > > >>>>> replicas.
> > > >>>>>>>>>>> Full
> > > >>>>>>>>>>>>>>> details
> > > >>>>>>>>>>>>>>>>>> are here:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
> > > >>> confluence/display/KAFKA/KIP-
> > > >>>>> 73+
> > > >>>>>>>>>>>>>>>>>> Replication+Quotas <https://cwiki.apache.org/conf
> > > >>>>>>>>>>>>>>>>>> luence/display/KAFKA/KIP-73+Replication+Quotas>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Please take a look and let us know your thoughts.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> B
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>> -Regards,
> > > >>>>>>>>>>>>> Mayuresh R. Gharat
> > > >>>>>>>>>>>>> (862) 250-7125
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> -Regards,
> > > >>>>>>>> Mayuresh R. Gharat
> > > >>>>>>>> (862) 250-7125
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> -Regards,
> > > >>>> Mayuresh R. Gharat
> > > >>>> (862) 250-7125
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Ben Stopford
> > > >>
> > >
> > >
> >
>

Reply via email to