Hey Viktor,

I like your latest idea regarding the replication/reassignment configs
interplay - I think it makes sense for replication to always be higher. A
small matrix of possibilities in the KIP may be useful to future readers
(users)
To be extra clear:
1. if reassignment.throttle is -1, reassignment traffic is counted with
replication traffic against replication.throttle
2. if replication.throttle is 20 and reassignment.throttle is 10, we have a
30 total throttle
Is my understanding correct?

Regarding the KIP - the motivation states

> So a user is able to specify the partition and the throttle rate but it
will be applied to all non-ISR replication traffic. This is undesirable
because if a node that is being throttled falls out of ISR it would further
prevent it from catching up.

This KIP does not solve this problem, right?
Or did you mean to mention the problem where reassignment replicas would
eat up the throttle and further limit the non-ISR "original" replicas from
catching up?

Best,
Stanislav

On Tue, Dec 10, 2019 at 9:09 AM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> This config will only be applied to those replicas which are reassigning
> and not yet in ISR. When they become ISR then reassignment throttling stops
> altogether and won't apply when they fall out of ISR. Specifically
> the validity of the config spans from the point when a reassignment is
> propagated by the adding_replicas field in the LeaderAndIsr request until
> the broker gets another LeaderAndIsr request saying that the new replica is
> added and in ISR. Furthermore the config will be applied only the actual
> leader and follower so if the leader changes in the meanwhile the
> throttling changes with it (again based on the LeaderAndIsr requests).
>
> For instance when a new broker is added to offload some partitions there,
> it will be safer to use this config instead of general fetch throttling for
> this very reason: when an existing partition that is being reassigned falls
> out of ISR then it will be propagated via the LeaderAndIsr request so
> throttling also changes. This removes the need for changing the configs
> manually and would give an easy way for people to configure throttling yet
> would make better efforts to not throttle what's not needed to be throttled
> (the replica which is falling out of ISR).
>
> Viktor
>
> On Fri, Dec 6, 2019 at 5:12 PM Ismael Juma <ism...@juma.me.uk> wrote:
>
> > My concern is that we're very focused on reassignment where I think users
> > enable throttling to avoid overwhelming brokers with replica catch up
> > traffic (typically disk and/or bandwidth). The current approach achieves
> > that by not throttling ISR replication.
> >
> > The downside is that when a broker falls out of the ISR, it may suddenly
> > get throttled and never catch up. However, if the throttle can cause this
> > kind of issue, then it's broken for replicas being reassigned too, so one
> > could say that it's a configuration error.
> >
> > Do we have specific scenarios that would be solved by the proposed
> change?
> >
> > Ismael
> >
> > On Fri, Dec 6, 2019 at 2:26 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > wrote:
> >
> > > Thanks for the question. I think it depends on how the user will try to
> > fix
> > > it.
> > > - If they just replace the disk then I think it shouldn't count as a
> > > reassignment and should be allocated under the normal replication
> quotas.
> > > In this case there is no reassignment going on as far as I can tell,
> the
> > > broker shuts down serving replicas from that dir/disk, notifies the
> > > controller which changes the leadership. When the disk is fixed the
> > broker
> > > will be restarted to pick up the changes and it starts the replication
> > from
> > > the current leader.
> > > - If the user reassigns the partitions to other brokers then it will
> fall
> > > under the reassignment traffic.
> > > Also if the user moves a partition to a different disk it would also
> > count
> > > as normal replication as it technically not a reassignment but an
> > > alter-replica-dir event but it's still done with the reassignment tool,
> > so
> > > I'd keep the current functionality of the
> > > --replica-alter-log-dirs-throttle.
> > > Is this aligned with your thinking?
> > >
> > > Viktor
> > >
> > > On Wed, Dec 4, 2019 at 2:47 PM Ismael Juma <isma...@gmail.com> wrote:
> > >
> > > > Thanks Viktor. How do we intend to handle the case where a broker
> loses
> > > its
> > > > disk and has to catch up from the beginning?
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Dec 4, 2019, 4:31 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for the notice Ismael, KAFKA-4313 fixed this issue indeed.
> > I've
> > > > > updated the KIP.
> > > > >
> > > > > Viktor
> > > > >
> > > > > On Tue, Dec 3, 2019 at 3:28 PM Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > > >
> > > > > > Hi Viktor,
> > > > > >
> > > > > > The KIP states:
> > > > > >
> > > > > > "KIP-73
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas
> > > > > > >
> > > > > > added
> > > > > > quotas for replication but it doesn't separate normal replication
> > > > traffic
> > > > > > from reassignment. So a user is able to specify the partition and
> > the
> > > > > > throttle rate but it will be applied to both ISR and non-ISR
> > traffic"
> > > > > >
> > > > > > This is not true, ISR traffic is not throttled.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Thu, Oct 24, 2019 at 5:38 AM Viktor Somogyi-Vass <
> > > > > > viktorsomo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi People,
> > > > > > >
> > > > > > > I've created a KIP to improve replication quotas by handling
> > > > > reassignment
> > > > > > > related throttling as a separate case with its own configurable
> > > > limits
> > > > > > and
> > > > > > > change the kafka-reassign-partitions tool to use these new
> > configs
> > > > > going
> > > > > > > forward.
> > > > > > > Please have a look, I'd be happy to receive any feedback and
> > answer
> > > > > > > all your questions.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Viktor
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best,
Stanislav

Reply via email to