>
>
>
> On the leader side, one challenge is related to the fairness issue that Ben
> brought up. The question is what if the fetch response limit is filled up
> by the throttled replicas? If this happens constantly, we will delay the
> progress of those un-throttled replicas. However, I think we can address
> this issue by trying to fill up the unthrottled replicas in the response
> first. So, the algorithm would be. Fill up unthrottled replicas up to the
> fetch response limit. If there is space left, fill up throttled replicas.
> If quota is exceeded for the throttled replicas, reduce the bytes in the
> throttled replicas in the response accordingly.
>

Right - that's what I was trying to convey by truncation (vs empty). So we
would attempt to fill the response for throttled partitions as much as we
can before hitting the quota limit. There is one more detail to handle in
this: if there are several throttled partitions and not enough remaining
allowance in the fetch response to include all the throttled replicas then
we would need to decide which of those partitions get a share; which is why
I'm wondering if it is easier to return empty for those partitions entirely
in the fetch response - they will make progress in the subsequent fetch. If
they don't make fast enough progress then that would be a case for raising
the threshold or letting it complete at an off-peak time.


>
> With this approach, we need some new logic to handle throttling on the
> leader, but we can leave the replica threading model unchanged. So,
> overall, this still seems to be a simpler approach.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 9, 2016 at 11:57 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > Nice write up Ben.
> >
> > I agree with Joel for keeping this simple by excluding the partitions
> from
> > the fetch request/response when the quota is violated at the follower or
> > leader instead of having a separate set of threads for handling the quota
> > and non quota cases. Even though its different from the current quota
> > implementation it should be OK since its internal to brokers and can be
> > handled by tuning the quota configs for it appropriately by the admins.
> >
> > Also can you elaborate with an example how this would be handled :
> > *guaranteeing
> > ordering of updates when replicas shift threads*
> >
> > Thanks,
> >
> > Mayuresh
> >
> >
> > On Tue, Aug 9, 2016 at 10:49 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
> >
> > > On the need for both leader/follower throttling: that makes sense -
> > thanks
> > > for clarifying. For completeness, can we add this detail to the doc -
> > say,
> > > after the quote that I pasted earlier?
> > >
> > > From an implementation perspective though: I’m still interested in the
> > > simplicity of not having to add separate replica fetchers, delay queue
> on
> > > the leader, and “move” partitions from the throttled replica fetchers
> to
> > > the regular replica fetchers once caught up.
> > >
> > > Instead, I think it would work and be simpler to include or exclude the
> > > partitions in the fetch request from the follower and fetch response
> from
> > > the leader when the quota is violated. The issue of fairness that Ben
> > noted
> > > may be a wash between the two options (that Ben wrote in his email).
> With
> > > the default quota delay mechanism, partitions get delayed essentially
> at
> > > random - i.e., whoever fetches at the time of quota violation gets
> > delayed
> > > at the leader. So we can adopt a similar policy in choosing to truncate
> > > partitions in fetch responses. i.e., if at the time of handling the
> fetch
> > > the “effect” replication rate exceeds the quota then either empty or
> > > truncate those partitions from the response. (BTW effect replication is
> > > your terminology in the wiki - i.e., replication due to partition
> > > reassignment, adding brokers, etc.)
> > >
> > > While this may be slightly different from the existing quota mechanism
> I
> > > think the difference is small (since we would reuse the quota manager
> at
> > > worst with some refactoring) and will be internal to the broker.
> > >
> > > So I guess the question is if this alternative is simpler enough and
> > > equally functional to not go with dedicated throttled replica fetchers.
> > >
> > > On Tue, Aug 9, 2016 at 9:44 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Just to elaborate on what Ben said why we need throttling on both the
> > > > leader and the follower side.
> > > >
> > > > If we only have throttling on the follower side, consider a case that
> > we
> > > > add 5 more new brokers and want to move some replicas from existing
> > > brokers
> > > > over to those 5 brokers. Each of those broker is going to fetch data
> > from
> > > > all existing brokers. Then, it's possible that the aggregated fetch
> > load
> > > > from those 5 brokers on a particular existing broker exceeds its
> > outgoing
> > > > network bandwidth, even though the inbounding traffic on each of
> those
> > 5
> > > > brokers is bounded.
> > > >
> > > > If we only have throttling on the leader side, consider the same
> > example
> > > > above. It's possible for the incoming traffic to each of those 5
> > brokers
> > > to
> > > > exceed its network bandwidth since it is fetching data from all
> > existing
> > > > brokers.
> > > >
> > > > So, being able to set a quota on both the follower and the leader
> side
> > > > protects both cases.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Aug 9, 2016 at 4:43 AM, Ben Stopford <b...@confluent.io>
> wrote:
> > > >
> > > > > Hi Joel
> > > > >
> > > > > Thanks for taking the time to look at this. Appreciated.
> > > > >
> > > > > Regarding throttling on both leader and follower, this proposal
> > covers
> > > a
> > > > > more general solution which can guarantee a quota, even when a
> > > rebalance
> > > > > operation produces an asymmetric profile of load. This means
> > > > administrators
> > > > > don’t need to calculate the impact that a follower-only quota will
> > have
> > > > on
> > > > > the leaders they are fetching from. So for example where replica
> > sizes
> > > > are
> > > > > skewed or where a partial rebalance is required.
> > > > >
> > > > > Having said that, even with both leader and follower quotas, the
> use
> > of
> > > > > additional threads is actually optional. There appear to be two
> > general
> > > > > approaches (1) omit partitions from fetch requests (follower) /
> fetch
> > > > > responses (leader) when they exceed their quota (2) delay them, as
> > the
> > > > > existing quota mechanism does, using separate fetchers. Both appear
> > > > valid,
> > > > > but with slightly different design tradeoffs.
> > > > >
> > > > > The issue with approach (1) is that it departs somewhat from the
> > > existing
> > > > > quotas implementation, and must include a notion of fairness
> within,
> > > the
> > > > > now size-bounded, request and response. The issue with (2) is
> > > > guaranteeing
> > > > > ordering of updates when replicas shift threads, but this is
> handled,
> > > for
> > > > > the most part, in the code today.
> > > > >
> > > > > I’ve updated the rejected alternatives section to make this a
> little
> > > > > clearer.
> > > > >
> > > > > B
> > > > >
> > > > >
> > > > >
> > > > > > On 8 Aug 2016, at 20:38, Joel Koshy <jjkosh...@gmail.com> wrote:
> > > > > >
> > > > > > Hi Ben,
> > > > > >
> > > > > > Thanks for the detailed write-up. So the proposal involves
> > > > > self-throttling
> > > > > > on the fetcher side and throttling at the leader. Can you
> elaborate
> > > on
> > > > > the
> > > > > > reasoning that is given on the wiki: *“The throttle is applied to
> > > both
> > > > > > leaders and followers. This allows the admin to exert strong
> > > guarantees
> > > > > on
> > > > > > the throttle limit".* Is there any reason why one or the other
> > > wouldn't
> > > > > be
> > > > > > sufficient.
> > > > > >
> > > > > > Specifically, if we were to only do self-throttling on the
> > fetchers,
> > > we
> > > > > > could potentially avoid the additional replica fetchers right?
> > i.e.,
> > > > the
> > > > > > replica fetchers would maintain its quota metrics as you proposed
> > and
> > > > > each
> > > > > > (normal) replica fetch presents an opportunity to make progress
> for
> > > the
> > > > > > throttled partitions as long as their effective consumption rate
> is
> > > > below
> > > > > > the quota limit. If it exceeds the consumption rate then don’t
> > > include
> > > > > the
> > > > > > throttled partitions in the subsequent fetch requests until the
> > > > effective
> > > > > > consumption rate for those partitions returns to within the quota
> > > > > threshold.
> > > > > >
> > > > > > I have more questions on the proposal, but was more interested in
> > the
> > > > > above
> > > > > > to see if it could simplify things a bit.
> > > > > >
> > > > > > Also, can you open up access to the google-doc that you link to?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Joel
> > > > > >
> > > > > > On Mon, Aug 8, 2016 at 5:54 AM, Ben Stopford <b...@confluent.io>
> > > wrote:
> > > > > >
> > > > > >> We’ve created KIP-73: Replication Quotas
> > > > > >>
> > > > > >> The idea is to allow an admin to throttle moving replicas. Full
> > > > details
> > > > > >> are here:
> > > > > >>
> > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+
> > > > > >> Replication+Quotas <https://cwiki.apache.org/conf
> > > > > >> luence/display/KAFKA/KIP-73+Replication+Quotas>
> > > > > >>
> > > > > >> Please take a look and let us know your thoughts.
> > > > > >>
> > > > > >> Thanks
> > > > > >>
> > > > > >> B
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> >
>

Reply via email to