Hi Harry,

Thanks for the updates!

Yes, the proposed metric looks good.

If the user runs the kafka-reassign-partitions script with throttle set,
then the static throttle gets overwritten
until the reassignment gets completed. Can you clarify this on the KIP?

--
Kamal



On Sun, Jul 14, 2024 at 9:59 PM Harry Fallows
<harryfall...@protonmail.com.invalid> wrote:

> Hi Kamal,
>
> Thank you for reading KIP-1051!
>
> Yes, it's true that it can impact regular replication traffic. However,
> network throughput is bounded so regardless of whether we allow it as a
> config in Kafka or not, there is always a chance that replication traffic
> will get throttled. Having it as a config will at least ensure that the
> entire bandwidth is not taken up by replication traffic.
>
> I agree, the nature of the leader replication throttling is dependent on
> how many followers there are, however, I don't think it's dependent on the
> partition assignment strategy or the number of brokers; it should only be
> dependent on the replication factor. I think it's key to point out here
> that these configurations do not need to be "optimised" for use cases with
> different replication factors, they just need to be set to match the
> infrastructure that they are deployed in. For example if you have a maximum
> network bandwidth of 200MB/s and a replication factor of 3, you may set
> follower.replication.throttled.replicas to 150MB/s, to reserve some
> bandwidth for other traffic (e.g. producing and consuming). In this case,
> if you start with all replicas in sync, I don't think it's possible for the
> follower throttling to be the sole cause of a replica falling out of sync.
> It may be the case that it takes longer for an out-of-sync replica to
> become in sync, but in that case the replication throttling just serves to
> mitigate other traffic from getting throttled (e.g. producer traffic to a
> different partition). Even so, it is possible that misconfiguring these
> values could cause issues, so the potential consequences should be clearly
> documented.
>
> I think the concern about producing spikes causing ISR issues is only an
> issue if these values are poorly configured. I think in general if these
> values are always configured as >=
> (replicationFactor/(replicationFactor+1))*maxBandwidth (e.g. like the above
> example: 3/(3+1) * 200 = 150), then even if 100% of the non-replication
> traffic is producer traffic, all followers should be able to stay in sync.
>
> I like the idea of emitting a metric for when a quota is breached, what do
> you think about having it as a gauge for number of partitions that are
> currently leader of follower throttled (similar to the URP metric)?
>
> Kind regards,
> Harry
>
> On Thursday, 11 July 2024 at 19:02, Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Harry Fallows,
> >
> > Thanks for the KIP!
> >
> > I went over both the KIP-1051 and KIP-1009. Assuming that the
> > leader.replication.throttled.replicas
> > and follower.replication.throttled.replicas are set to Wildcard (*) to
> > apply for all the partitions in the
> > broker. If we set a static value for leader and follower replication
> > throttled rate, then it might impact
> > the normal replication traffic.
> >
> > Throttling rate depends on the number of brokers in the cluster. If the
> > cluster contains 100+ brokers, then
> > the leader.replication.throttled.rate is shared across all the followers.
> > The number of followers reading
> > data from the leader depends on the partition assignment strategy. If the
> > leader replication throttle is breached,
> > then the follower might fail to catch-up with the leader.
> >
> > If there are sudden spikes in a specific set of topics/partitions in the
> > cluster, then the replicas might fail to join
> > the isr and can impact the cluster reliability. If we are going with this
> > proposal, then we may also have to emit
> > a metric to inform the administrator that the leader/follower replication
> > quota is breached.
> >
> > --
> > Kamal
> >
> > On Thu, Jul 4, 2024 at 8:10 PM Harry Fallows
> > harryfall...@protonmail.com.invalid wrote:
> >
> > > Hi everyone,
> > >
> > > Bumping this one last time before I call a vote. Please take a look if
> > > you're interested in replication throttling and/or static/dynamic
> config.
> > >
> > > Kind regards,
> > > Harry
> > >
> > > On Thursday, 13 June 2024 at 19:39, Harry Fallows <
> > > harryfall...@protonmail.com.INVALID> wrote:
> > >
> > > > Hi Hector,
> > > >
> > > > I did see your colleague's KIP, and I actually mentioned it in the
> KIP
> > > > that I have written. As I see it, both of these KIPs move towards
> more
> > > > easily configurable replication throttling and both should be
> implemented.
> > > > KIP-1009 makes it easier to enable throttling and KIP-1051 makes it
> easier
> > > > to apply a throttle rate. I did try to look at supporting KIP-1009
> in the
> > > > discussion thread, however, I only subscribed to the mailing list
> after it
> > > > was published and I couldn't figure out how to respond to it in Pony
> mail.
> > > > I would be definitely be interested in partnering up to get both
> changes
> > > > across the line, whether that be by combining them or supporting both
> > > > individually (I'm not sure which is best, this is my first
> contribution!).
> > > >
> > > > I also see that KAFKA-10190 is mentioned in KIP-1009 as a related
> > > > ticket. Coincidentally, I raised a PR to address this bug a couple
> of days
> > > > ago (https://github.com/apache/kafka/pull/16280). I think this is
> also a
> > > > change that will move towards more easily configurable replication
> > > > throttling as it allows configuring the throttle rate across the
> whole
> > > > cluster via a default value. As far as I understand, this change
> does not
> > > > need a KIP though because it is a bugfix (the current behaviour of
> ignoring
> > > > the default is unintentional).
> > > >
> > > > Let me know what you think.
> > > >
> > > > Kind regards,
> > > > Harry
> > > >
> > > > -------- Original Message --------
> > > > On 6/13/24 19:08, Hector Geraldino (BLOOMBERG/ 919 3RD A)
> > > > hgerald...@bloomberg.net wrote:
> > > >
> > > > > Hi Harry,
> > > > >
> > > > > A colleague of mine opened KIP-1009: Add Broker-level Throttle
> > > > > Configurations, which aims to achieve the same goal (although from
> a
> > > > > different angle).
> > > > >
> > > > > Can you please take a look and see if this would work for the
> things
> > > > > you have in mind? Maybe we can partner and coalesce around either
> KIP and
> > > > > try to push it to the end line.
> > > > >
> > > > > KIP:
> > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1009%3A+Add+Broker-level+Throttle+Configurations
> > > > >
> > > > > From: dev@kafka.apache.org At: 06/13/24 09:22:40 UTC-4:00To:
> > > > > dev@kafka.apache.org
> > > > > Subject: Re: [DISCUSS] KIP-1051 Statically configured log
> replication
> > > > > throttling
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > Bumping this thread, as I haven't yet had any replies.
> > > > >
> > > > > Kind regards,
> > > > > Harry
> > > > >
> > > > > On Thursday, 6 June 2024 at 17:59, Harry Fallows
> > > > > harryfall...@protonmail.com.INVALID wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I would like to propose a change to allow the static
> configuration
> > > > > > of leader
> > > > > > and follower replication throttling rates.
> > > > > >
> > > > > > These configurations are very useful for preventing client
> traffic
> > > > > > from
> > > > > > getting throttled by replication traffic during events that
> cause a
> > > > > > spike in
> > > > > > replication. Currently they are only configurable dynamically,
> which
> > > > > > means they
> > > > > > are only really useful for throttling replication traffic during
> > > > > > planned
> > > > > > events. By allowing these configurations to be set statically,
> they
> > > > > > can be used
> > > > > > to prevent client traffic throttling during unplanned events.
> > > > > >
> > > > > > KIP:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1051%3A+Statically+configu
> > >
> > > > > > red+log+replication+throttling
> > > > > >
> > > > > > Best regards,
> > > > > > Harry Fallows
>

Reply via email to