Re: [DISCUSS] KIP-73: Replication Quotas

Ben Stopford Thu, 18 Aug 2016 09:38:25 -0700

Hi Joel

Ha! yes we had some similar thoughts, on both counts. Both are actually good 
approaches, but come with some extra complexity.


Segregating the replication type is tempting as it creates a more general 
solution. One issue is you need to draw a line between lagging and not lagging. 
The ISR ‘limit' is a tempting divider, but has the side effect that, once you 
drop out you get immediately throttled. Adding a configurable divider is 
another option, but difficult for admins to set, and always a little arbitrary. 
A better idea is to prioritise, in reverse order to lag. But that also comes 
with additional complexity of its own. 

Under throttling is also a tempting addition. That’s to say, if there’s idle 
bandwidth lying around, not being used, why not use it to let lagging brokers 
catch up. This involves some comparison to the maximum bandwidth, which could 
be configurable, or could be derived, with pros and cons for each. 

But the more general problem is actually quite hard to reason about, so after 
some discussion we decided to settle on something simple, that we felt we could 
get working, and extend to add these additional features as subsequent KIPs. 

I hope that seems reasonable. Jun may wish to add to this. 

B


> On 18 Aug 2016, at 06:56, Joel Koshy <jjkosh...@gmail.com> wrote:
> 
> On Wed, Aug 17, 2016 at 9:13 PM, Ben Stopford <b...@confluent.io> wrote:
> 
>> 
>> Let's us know if you have any further thoughts on KIP-73, else we'll kick
>> off a vote.
>> 
> 
> I think the mechanism for throttling replicas looks good. Just had a few
> more thoughts on the configuration section. What you have looks reasonable,
> but I was wondering if it could be made simpler. You probably thought
> through these, so I'm curious to know your take.
> 
> My guess is that most of the time, users would want to throttle all effect
> replication - due to partition reassignments, adding brokers or a broker
> coming back online after an extended period of time. In all these scenarios
> it may be possible to distinguish bootstrap (effect) vs normal replication
> - based on how far the replica has to catch up. I'm wondering if it is
> enough to just set an umbrella "effect" replication quota with perhaps
> per-topic overrides (say if some topics are more important than others) as
> opposed to designating throttled replicas.
> 
> Also, IIRC during client-side quota discussions we had considered the
> possibility of allowing clients to go above their quotas when resources are
> available. We ended up not doing that, but for replication throttling it
> may make sense - i.e., to treat the quota as a soft limit. Another way to
> look at it is instead of ensuring "effect replication traffic does not flow
> faster than X bytes/sec" it may be useful to instead ensure that "effect
> replication traffic only flows as slowly as necessary (so as not to
> adversely affect normal replication traffic)."
> 
> Thanks,
> 
> Joel
> 
>>> 
>>>> On Thu, Aug 11, 2016 at 2:43 PM, Jun Rao <j...@confluent.io
>>> <javascript:;>> wrote:
>>>> 
>>>>> Hi, Joel,
>>>>> 
>>>>> Yes, the response size includes both throttled and unthrottled
>>> replicas.
>>>>> However, the response is only delayed up to max.wait if the response
>>> size
>>>>> is less than min.bytes, which matches the current behavior. So, there
>>> is
>>>> no
>>>>> extra delay to due throttling, right? For replica fetchers, the
>> default
>>>>> min.byte is 1. So, the response is only delayed if there is no byte
>> in
>>>> the
>>>>> response, which is what we want.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Thu, Aug 11, 2016 at 11:53 AM, Joel Koshy <jjkosh...@gmail.com
>>> <javascript:;>>
>>>> wrote:
>>>>> 
>>>>>> Hi Jun,
>>>>>> 
>>>>>> I'm not sure that would work unless we have separate replica
>>> fetchers,
>>>>>> since this would cause all replicas (including ones that are not
>>>>> throttled)
>>>>>> to get delayed. Instead, we could just have the leader populate the
>>>>>> throttle-time field of the response as a hint to the follower as to
>>> how
>>>>>> long it should wait before it adds those replicas back to its
>>>> subsequent
>>>>>> replica fetch requests.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Joel
>>>>>> 
>>>>>> On Thu, Aug 11, 2016 at 9:50 AM, Jun Rao <j...@confluent.io
>>> <javascript:;>> wrote:
>>>>>> 
>>>>>>> Mayuresh,
>>>>>>> 
>>>>>>> That's a good question. I think if the response size (after
>> leader
>>>>>>> throttling) is smaller than min.bytes, we will just delay the
>>> sending
>>>>> of
>>>>>>> the response up to max.wait as we do now. This should prevent
>>>> frequent
>>>>>>> empty responses to the follower.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> On Wed, Aug 10, 2016 at 9:17 PM, Mayuresh Gharat <
>>>>>>> gharatmayures...@gmail.com <javascript:;>
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> This might have been answered before.
>>>>>>>> I was wondering when the leader quota is reached and it sends
>>> empty
>>>>>>>> response ( If the inclusion of a partition, listed in the
>>> leader's
>>>>>>>> throttled-replicas list, causes the LeaderQuotaRate to be
>>> exceeded,
>>>>>> that
>>>>>>>> partition is omitted from the response (aka returns 0 bytes).).
>>> At
>>>>> this
>>>>>>>> point the follower quota is NOT reached and the follower is
>> still
>>>>> going
>>>>>>> to
>>>>>>>> ask for the that partition in the next fetch request. Would it
>> be
>>>>> fair
>>>>>> to
>>>>>>>> add some logic there so that the follower backs off ( for some
>>>>>>> configurable
>>>>>>>> time) from including those partitions in the next fetch
>> request?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Mayuresh
>>>>>>>> 
>>>>>>>> On Wed, Aug 10, 2016 at 8:06 AM, Ben Stopford <
>> b...@confluent.io
>>> <javascript:;>>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks again for the responses everyone. I’ve removed the the
>>>> extra
>>>>>>>>> fetcher threads from the proposal, switching to the
>>>> inclusion-based
>>>>>>>>> approach. The relevant section is:
>>>>>>>>> 
>>>>>>>>> The follower makes a requests, using the fixed size of
>>>>>>>>> replica.fetch.response.max.bytes as per KIP-74 <
>>>>>>>> https://cwiki.apache.org/
>>>>>>>>> confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+
>>>>>>>> Limit+in+Bytes>.
>>>>>>>>> The order of the partitions in the fetch request are
>> randomised
>>>> to
>>>>>>> ensure
>>>>>>>>> fairness.
>>>>>>>>> When the leader receives the fetch request it processes the
>>>>>> partitions
>>>>>>> in
>>>>>>>>> the defined order, up to the response's size limit. If the
>>>>> inclusion
>>>>>>> of a
>>>>>>>>> partition, listed in the leader's throttled-replicas list,
>>> causes
>>>>> the
>>>>>>>>> LeaderQuotaRate to be exceeded, that partition is omitted
>> from
>>>> the
>>>>>>>> response
>>>>>>>>> (aka returns 0 bytes). Logically, this is of the form:
>>>>>>>>> var bytesAllowedForThrottledPartition =
>>>>> quota.recordAndMaybeAdjust(
>>>>>>>>> bytesRequestedForPartition)
>>>>>>>>> When the follower receives the fetch response, if it includes
>>>>>>> partitions
>>>>>>>>> in its throttled-partitions list, it increments the
>>>>>> FollowerQuotaRate:
>>>>>>>>> var includeThrottledPartitionsInNextRequest: Boolean =
>>>>>>>>> quota.recordAndEvaluate(previousResponseThrottledBytes)
>>>>>>>>> If the quota is exceeded, no throttled partitions will be
>>>> included
>>>>> in
>>>>>>> the
>>>>>>>>> next fetch request emitted by this replica fetcher thread.
>>>>>>>>> 
>>>>>>>>> B
>>>>>>>>> 
>>>>>>>>>> On 9 Aug 2016, at 23:34, Jun Rao <j...@confluent.io
>>> <javascript:;>> wrote:
>>>>>>>>>> 
>>>>>>>>>> When there are several unthrottled replicas, we could also
>>> just
>>>>> do
>>>>>>>> what's
>>>>>>>>>> suggested in KIP-74. The client is responsible for
>> reordering
>>>> the
>>>>>>>>>> partitions and the leader fills in the bytes to those
>>>> partitions
>>>>> in
>>>>>>>>> order,
>>>>>>>>>> up to the quota limit.
>>>>>>>>>> 
>>>>>>>>>> We could also do what you suggested. If quota is exceeded,
>>>>> include
>>>>>>>> empty
>>>>>>>>>> data in the response for throttled replicas. Keep doing
>> that
>>>>> until
>>>>>>>> enough
>>>>>>>>>> time has passed so that the quota is no longer exceeded.
>> This
>>>>>>>> potentially
>>>>>>>>>> allows better batching per partition. Not sure if the two
>>>> makes a
>>>>>> big
>>>>>>>>>> difference in practice though.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> Jun
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 9, 2016 at 2:31 PM, Joel Koshy <
>>>> jjkosh...@gmail.com <javascript:;>>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On the leader side, one challenge is related to the
>>> fairness
>>>>>> issue
>>>>>>>> that
>>>>>>>>>>> Ben
>>>>>>>>>>>> brought up. The question is what if the fetch response
>>> limit
>>>> is
>>>>>>>> filled
>>>>>>>>> up
>>>>>>>>>>>> by the throttled replicas? If this happens constantly, we
>>>> will
>>>>>>> delay
>>>>>>>>> the
>>>>>>>>>>>> progress of those un-throttled replicas. However, I think
>>> we
>>>>> can
>>>>>>>>> address
>>>>>>>>>>>> this issue by trying to fill up the unthrottled replicas
>> in
>>>> the
>>>>>>>>> response
>>>>>>>>>>>> first. So, the algorithm would be. Fill up unthrottled
>>>> replicas
>>>>>> up
>>>>>>> to
>>>>>>>>> the
>>>>>>>>>>>> fetch response limit. If there is space left, fill up
>>>> throttled
>>>>>>>>> replicas.
>>>>>>>>>>>> If quota is exceeded for the throttled replicas, reduce
>> the
>>>>> bytes
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>> throttled replicas in the response accordingly.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Right - that's what I was trying to convey by truncation
>> (vs
>>>>>> empty).
>>>>>>>> So
>>>>>>>>> we
>>>>>>>>>>> would attempt to fill the response for throttled
>> partitions
>>> as
>>>>>> much
>>>>>>> as
>>>>>>>>> we
>>>>>>>>>>> can before hitting the quota limit. There is one more
>> detail
>>>> to
>>>>>>> handle
>>>>>>>>> in
>>>>>>>>>>> this: if there are several throttled partitions and not
>>> enough
>>>>>>>> remaining
>>>>>>>>>>> allowance in the fetch response to include all the
>> throttled
>>>>>>> replicas
>>>>>>>>> then
>>>>>>>>>>> we would need to decide which of those partitions get a
>>> share;
>>>>>> which
>>>>>>>> is
>>>>>>>>> why
>>>>>>>>>>> I'm wondering if it is easier to return empty for those
>>>>> partitions
>>>>>>>>> entirely
>>>>>>>>>>> in the fetch response - they will make progress in the
>>>>> subsequent
>>>>>>>>> fetch. If
>>>>>>>>>>> they don't make fast enough progress then that would be a
>>> case
>>>>> for
>>>>>>>>> raising
>>>>>>>>>>> the threshold or letting it complete at an off-peak time.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> With this approach, we need some new logic to handle
>>>> throttling
>>>>>> on
>>>>>>>> the
>>>>>>>>>>>> leader, but we can leave the replica threading model
>>>> unchanged.
>>>>>> So,
>>>>>>>>>>>> overall, this still seems to be a simpler approach.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Jun
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat <
>>>>>>>>>>>> gharatmayures...@gmail.com <javascript:;>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Nice write up Ben.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree with Joel for keeping this simple by excluding
>> the
>>>>>>>> partitions
>>>>>>>>>>>> from
>>>>>>>>>>>>> the fetch request/response when the quota is violated at
>>> the
>>>>>>>> follower
>>>>>>>>>>> or
>>>>>>>>>>>>> leader instead of having a separate set of threads for
>>>>> handling
>>>>>>> the
>>>>>>>>>>> quota
>>>>>>>>>>>>> and non quota cases. Even though its different from the
>>>>> current
>>>>>>>> quota
>>>>>>>>>>>>> implementation it should be OK since its internal to
>>> brokers
>>>>> and
>>>>>>> can
>>>>>>>>> be
>>>>>>>>>>>>> handled by tuning the quota configs for it appropriately
>>> by
>>>>> the
>>>>>>>>> admins.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also can you elaborate with an example how this would be
>>>>>> handled :
>>>>>>>>>>>>> *guaranteeing
>>>>>>>>>>>>> ordering of updates when replicas shift threads*
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Mayuresh
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy <
>>>>>> jjkosh...@gmail.com <javascript:;>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On the need for both leader/follower throttling: that
>>> makes
>>>>>>> sense -
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>> for clarifying. For completeness, can we add this
>> detail
>>> to
>>>>> the
>>>>>>>> doc -
>>>>>>>>>>>>> say,
>>>>>>>>>>>>>> after the quote that I pasted earlier?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> From an implementation perspective though: I’m still
>>>>> interested
>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>> simplicity of not having to add separate replica
>>> fetchers,
>>>>>> delay
>>>>>>>>>>> queue
>>>>>>>>>>>> on
>>>>>>>>>>>>>> the leader, and “move” partitions from the throttled
>>>> replica
>>>>>>>> fetchers
>>>>>>>>>>>> to
>>>>>>>>>>>>>> the regular replica fetchers once caught up.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Instead, I think it would work and be simpler to
>> include
>>> or
>>>>>>> exclude
>>>>>>>>>>> the
>>>>>>>>>>>>>> partitions in the fetch request from the follower and
>>> fetch
>>>>>>>> response
>>>>>>>>>>>> from
>>>>>>>>>>>>>> the leader when the quota is violated. The issue of
>>>> fairness
>>>>>> that
>>>>>>>> Ben
>>>>>>>>>>>>> noted
>>>>>>>>>>>>>> may be a wash between the two options (that Ben wrote
>> in
>>>> his
>>>>>>>> email).
>>>>>>>>>>>> With
>>>>>>>>>>>>>> the default quota delay mechanism, partitions get
>> delayed
>>>>>>>> essentially
>>>>>>>>>>>> at
>>>>>>>>>>>>>> random - i.e., whoever fetches at the time of quota
>>>> violation
>>>>>>> gets
>>>>>>>>>>>>> delayed
>>>>>>>>>>>>>> at the leader. So we can adopt a similar policy in
>>> choosing
>>>>> to
>>>>>>>>>>> truncate
>>>>>>>>>>>>>> partitions in fetch responses. i.e., if at the time of
>>>>> handling
>>>>>>> the
>>>>>>>>>>>> fetch
>>>>>>>>>>>>>> the “effect” replication rate exceeds the quota then
>>> either
>>>>>> empty
>>>>>>>> or
>>>>>>>>>>>>>> truncate those partitions from the response. (BTW
>> effect
>>>>>>>> replication
>>>>>>>>>>> is
>>>>>>>>>>>>>> your terminology in the wiki - i.e., replication due to
>>>>>> partition
>>>>>>>>>>>>>> reassignment, adding brokers, etc.)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> While this may be slightly different from the existing
>>>> quota
>>>>>>>>>>> mechanism
>>>>>>>>>>>> I
>>>>>>>>>>>>>> think the difference is small (since we would reuse the
>>>> quota
>>>>>>>> manager
>>>>>>>>>>>> at
>>>>>>>>>>>>>> worst with some refactoring) and will be internal to
>> the
>>>>>> broker.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So I guess the question is if this alternative is
>> simpler
>>>>>> enough
>>>>>>>> and
>>>>>>>>>>>>>> equally functional to not go with dedicated throttled
>>>> replica
>>>>>>>>>>> fetchers.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao <
>>> j...@confluent.io <javascript:;>>
>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Just to elaborate on what Ben said why we need
>>> throttling
>>>> on
>>>>>>> both
>>>>>>>>>>> the
>>>>>>>>>>>>>>> leader and the follower side.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we only have throttling on the follower side,
>>> consider
>>>> a
>>>>>> case
>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> add 5 more new brokers and want to move some replicas
>>> from
>>>>>>>> existing
>>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>>> over to those 5 brokers. Each of those broker is going
>>> to
>>>>>> fetch
>>>>>>>>>>> data
>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> all existing brokers. Then, it's possible that the
>>>>> aggregated
>>>>>>>> fetch
>>>>>>>>>>>>> load
>>>>>>>>>>>>>>> from those 5 brokers on a particular existing broker
>>>> exceeds
>>>>>> its
>>>>>>>>>>>>> outgoing
>>>>>>>>>>>>>>> network bandwidth, even though the inbounding traffic
>> on
>>>>> each
>>>>>> of
>>>>>>>>>>>> those
>>>>>>>>>>>>> 5
>>>>>>>>>>>>>>> brokers is bounded.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If we only have throttling on the leader side,
>> consider
>>>> the
>>>>>> same
>>>>>>>>>>>>> example
>>>>>>>>>>>>>>> above. It's possible for the incoming traffic to each
>> of
>>>>>> those 5
>>>>>>>>>>>>> brokers
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> exceed its network bandwidth since it is fetching data
>>>> from
>>>>>> all
>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>> brokers.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So, being able to set a quota on both the follower and
>>> the
>>>>>>> leader
>>>>>>>>>>>> side
>>>>>>>>>>>>>>> protects both cases.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jun
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford <
>>>>>> b...@confluent.io <javascript:;>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Joel
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for taking the time to look at this.
>>> Appreciated.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Regarding throttling on both leader and follower,
>> this
>>>>>> proposal
>>>>>>>>>>>>> covers
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> more general solution which can guarantee a quota,
>> even
>>>>> when
>>>>>> a
>>>>>>>>>>>>>> rebalance
>>>>>>>>>>>>>>>> operation produces an asymmetric profile of load.
>> This
>>>>> means
>>>>>>>>>>>>>>> administrators
>>>>>>>>>>>>>>>> don’t need to calculate the impact that a
>> follower-only
>>>>> quota
>>>>>>>>>>> will
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the leaders they are fetching from. So for example
>>> where
>>>>>>> replica
>>>>>>>>>>>>> sizes
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> skewed or where a partial rebalance is required.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Having said that, even with both leader and follower
>>>>> quotas,
>>>>>>> the
>>>>>>>>>>>> use
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> additional threads is actually optional. There appear
>>> to
>>>> be
>>>>>> two
>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>> approaches (1) omit partitions from fetch requests
>>>>>> (follower) /
>>>>>>>>>>>> fetch
>>>>>>>>>>>>>>>> responses (leader) when they exceed their quota (2)
>>> delay
>>>>>> them,
>>>>>>>>>>> as
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> existing quota mechanism does, using separate
>> fetchers.
>>>>> Both
>>>>>>>>>>> appear
>>>>>>>>>>>>>>> valid,
>>>>>>>>>>>>>>>> but with slightly different design tradeoffs.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The issue with approach (1) is that it departs
>> somewhat
>>>>> from
>>>>>>> the
>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>> quotas implementation, and must include a notion of
>>>>> fairness
>>>>>>>>>>>> within,
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> now size-bounded, request and response. The issue
>> with
>>>> (2)
>>>>> is
>>>>>>>>>>>>>>> guaranteeing
>>>>>>>>>>>>>>>> ordering of updates when replicas shift threads, but
>>> this
>>>>> is
>>>>>>>>>>>> handled,
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> the most part, in the code today.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I’ve updated the rejected alternatives section to
>> make
>>>>> this a
>>>>>>>>>>>> little
>>>>>>>>>>>>>>>> clearer.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> B
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 8 Aug 2016, at 20:38, Joel Koshy <
>>>> jjkosh...@gmail.com <javascript:;>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the detailed write-up. So the proposal
>>>> involves
>>>>>>>>>>>>>>>> self-throttling
>>>>>>>>>>>>>>>>> on the fetcher side and throttling at the leader.
>> Can
>>>> you
>>>>>>>>>>>> elaborate
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> reasoning that is given on the wiki: *“The throttle
>> is
>>>>>> applied
>>>>>>>>>>> to
>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>> leaders and followers. This allows the admin to
>> exert
>>>>> strong
>>>>>>>>>>>>>> guarantees
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> the throttle limit".* Is there any reason why one or
>>> the
>>>>>> other
>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> sufficient.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Specifically, if we were to only do self-throttling
>> on
>>>> the
>>>>>>>>>>>>> fetchers,
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> could potentially avoid the additional replica
>>> fetchers
>>>>>> right?
>>>>>>>>>>>>> i.e.,
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> replica fetchers would maintain its quota metrics as
>>> you
>>>>>>>>>>> proposed
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>> (normal) replica fetch presents an opportunity to
>> make
>>>>>>> progress
>>>>>>>>>>>> for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> throttled partitions as long as their effective
>>>>> consumption
>>>>>>>>>>> rate
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> below
>>>>>>>>>>>>>>>>> the quota limit. If it exceeds the consumption rate
>>> then
>>>>>> don’t
>>>>>>>>>>>>>> include
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> throttled partitions in the subsequent fetch
>> requests
>>>>> until
>>>>>>> the
>>>>>>>>>>>>>>> effective
>>>>>>>>>>>>>>>>> consumption rate for those partitions returns to
>>> within
>>>>> the
>>>>>>>>>>> quota
>>>>>>>>>>>>>>>> threshold.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I have more questions on the proposal, but was more
>>>>>> interested
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> above
>>>>>>>>>>>>>>>>> to see if it could simplify things a bit.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also, can you open up access to the google-doc that
>>> you
>>>>> link
>>>>>>>>>>> to?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Joel
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <
>>>>>>> b...@confluent.io <javascript:;>
>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> We’ve created KIP-73: Replication Quotas
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The idea is to allow an admin to throttle moving
>>>>> replicas.
>>>>>>>>>>> Full
>>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>>>>> are here:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/
>>> confluence/display/KAFKA/KIP-
>>>>> 73+
>>>>>>>>>>>>>>>>>> Replication+Quotas <https://cwiki.apache.org/conf
>>>>>>>>>>>>>>>>>> luence/display/KAFKA/KIP-73+Replication+Quotas>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Please take a look and let us know your thoughts.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> B
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -Regards,
>>>>>>>>>>>>> Mayuresh R. Gharat
>>>>>>>>>>>>> (862) 250-7125
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> -Regards,
>>>>>>>> Mayuresh R. Gharat
>>>>>>>> (862) 250-7125
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Regards,
>>>> Mayuresh R. Gharat
>>>> (862) 250-7125
>>>> 
>>> 
>> 
>> 
>> --
>> Ben Stopford
>>

Re: [DISCUSS] KIP-73: Replication Quotas

Reply via email to