subject:"KIP\-599\: Throttle Create Topic, Create Partition and Delete Topic Operations"

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-08-06 Thread David Jacot

Hi all,

I would like to inform you that we have slightly changed our thoughts about
the implementation
of the Token Bucket algorithm. Our initial idea was to change our existing
Rate to behave like
a Token Bucket. That works as we expected but we have realized that the
value of the Rate is
not really a sampled Rate anymore. From an observability perspective, it is
not good to change
the behavior of our Rate. Instead, we propose to implement a new
MeasurableStat that implements
the Token Bucket algorithm and to use it alongside the Rate in the Sensor.
Which this, the Rate
remains there for observability purposes and the Token Bucket is used for
enforcing the quota.
The Token Bucket metrics is exposed via a new metric named "credits" that
represents the
remaining amount of credits in the bucket. As a first step, this will be
only used for the controller
quota.

I have updated the KIP to reflect this change.

We hope this change is fine for everyone. Please, raise your concerns if
not.

Best,
David

On Tue, Jul 28, 2020 at 1:48 PM David Jacot  wrote:

> Hi all,
>
> Just a quick update. We have made good progress regarding the
> implementation
> of this KIP. The major parts are already in trunk modulo the new "rate"
> implementation
> which is still under development.
>
> I would like to change the type of the `controller_mutations_rate` from a
> Long to
> a Double. I have made various experiments and the rate can be quite low,
> especially
> in clusters with a lot of tenants. Using a Long is quite limiting here to
> fine tune small
> rates. I hope that this change is fine for everyone.
>
> Best,
> David
>
> On Mon, Jun 15, 2020 at 9:26 AM David Jacot  wrote:
>
>> Hi all,
>>
>> The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and
>> Colin)
>> and 2 non-binding votes (Tom, Anna).
>>
>> Thank you all for the fruitful discussion! I'd like to particularly thank
>> Anna who has
>> heavily contributed to the design of this KIP.
>>
>> Regards,
>> David
>>
>> On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe  wrote:
>>
>>> +1.  Thanks, David!
>>>
>>> best,
>>> Colin
>>>
>>> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
>>> > Colin, Jun,
>>> >
>>> > Do the proposed error code and the updated KIP look good to you guys?
>>> I’d
>>> > like to wrap up and close the vote.
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > Le mer. 10 juin 2020 à 14:50, David Jacot  a
>>> écrit :
>>> >
>>> > > Hi Colin and Jun,
>>> > >
>>> > > I have no problem if we have to rewrite part of it when the new
>>> controller
>>> > > comes
>>> > > out. I will be more than happy to help out.
>>> > >
>>> > > Regarding KIP-590, I think that we can cope with a principal as a
>>> string
>>> > > when the
>>> > > time comes. The user entity name is defined with a string already.
>>> > >
>>> > > Regarding the name of the error, you have made a good point. I do
>>> agree
>>> > > that it
>>> > > is important to differentiate the two cases. I propose the following
>>> two
>>> > > errors:
>>> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than
>>> rate as
>>> > > we have quotas which are not rate (e.g. request quota). This one is
>>> > > retryable
>>> > > once the throttle time is passed.
>>> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has
>>> been
>>> > > reached and is a final error.
>>> > > We only need the former in this KIP. What do you think?
>>> > >
>>> > > Jun, I have added a few examples in the KIP. The new name works
>>> exactly
>>> > > like
>>> > > the existing one once it is added to the accepted dynamic configs
>>> for the
>>> > > user
>>> > > and the client entities. I have added a "Kafka Config Command"
>>> chapter in
>>> > > the
>>> > > KIP. I will also open a Jira to not forget updating the AK
>>> documentation
>>> > > once
>>> > > the KIP gets merged.
>>> > >
>>> > > Thanks,
>>> > > David
>>> > >
>>> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
>>> > >
>>> > >> Hi, Colin,
>>> > >>
>>> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this
>>> clear.
>>> > >>
>>> > >> Hi, David,
>>> > >>
>>> > >> We added a new quota name in the KIP. You chose not to bump up the
>>> version
>>> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok
>>> since
>>> > >> the
>>> > >> quota name is represented as a string. However, the new quota name
>>> can be
>>> > >> used in client tools for setting and listing the quota (
>>> > >> https://kafka.apache.org/documentation/#quotas). Could you
>>> document how
>>> > >> the
>>> > >> new name will be used in those tools?
>>> > >>
>>> > >> Thanks,
>>> > >>
>>> > >> Jun
>>> > >>
>>> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe 
>>> wrote:
>>> > >>
>>> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
>>> > >> > > Hi Colin,
>>> > >> > >
>>> > >> > > Thank you for your feedback.
>>> > >> > >
>>> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I
>>> would
>>> > >> like to
>>> > >> > > complement

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-07-28 Thread David Jacot

Hi all,

Just a quick update. We have made good progress regarding the implementation
of this KIP. The major parts are already in trunk modulo the new "rate"
implementation
which is still under development.

I would like to change the type of the `controller_mutations_rate` from a
Long to
a Double. I have made various experiments and the rate can be quite low,
especially
in clusters with a lot of tenants. Using a Long is quite limiting here to
fine tune small
rates. I hope that this change is fine for everyone.

Best,
David

On Mon, Jun 15, 2020 at 9:26 AM David Jacot  wrote:

> Hi all,
>
> The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and
> Colin)
> and 2 non-binding votes (Tom, Anna).
>
> Thank you all for the fruitful discussion! I'd like to particularly thank
> Anna who has
> heavily contributed to the design of this KIP.
>
> Regards,
> David
>
> On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe  wrote:
>
>> +1.  Thanks, David!
>>
>> best,
>> Colin
>>
>> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
>> > Colin, Jun,
>> >
>> > Do the proposed error code and the updated KIP look good to you guys?
>> I’d
>> > like to wrap up and close the vote.
>> >
>> > Thanks,
>> > David
>> >
>> > Le mer. 10 juin 2020 à 14:50, David Jacot  a
>> écrit :
>> >
>> > > Hi Colin and Jun,
>> > >
>> > > I have no problem if we have to rewrite part of it when the new
>> controller
>> > > comes
>> > > out. I will be more than happy to help out.
>> > >
>> > > Regarding KIP-590, I think that we can cope with a principal as a
>> string
>> > > when the
>> > > time comes. The user entity name is defined with a string already.
>> > >
>> > > Regarding the name of the error, you have made a good point. I do
>> agree
>> > > that it
>> > > is important to differentiate the two cases. I propose the following
>> two
>> > > errors:
>> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate
>> as
>> > > we have quotas which are not rate (e.g. request quota). This one is
>> > > retryable
>> > > once the throttle time is passed.
>> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has
>> been
>> > > reached and is a final error.
>> > > We only need the former in this KIP. What do you think?
>> > >
>> > > Jun, I have added a few examples in the KIP. The new name works
>> exactly
>> > > like
>> > > the existing one once it is added to the accepted dynamic configs for
>> the
>> > > user
>> > > and the client entities. I have added a "Kafka Config Command"
>> chapter in
>> > > the
>> > > KIP. I will also open a Jira to not forget updating the AK
>> documentation
>> > > once
>> > > the KIP gets merged.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
>> > >
>> > >> Hi, Colin,
>> > >>
>> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this
>> clear.
>> > >>
>> > >> Hi, David,
>> > >>
>> > >> We added a new quota name in the KIP. You chose not to bump up the
>> version
>> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok
>> since
>> > >> the
>> > >> quota name is represented as a string. However, the new quota name
>> can be
>> > >> used in client tools for setting and listing the quota (
>> > >> https://kafka.apache.org/documentation/#quotas). Could you document
>> how
>> > >> the
>> > >> new name will be used in those tools?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jun
>> > >>
>> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe 
>> wrote:
>> > >>
>> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
>> > >> > > Hi Colin,
>> > >> > >
>> > >> > > Thank you for your feedback.
>> > >> > >
>> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would
>> > >> like to
>> > >> > > complement it with the following points:
>> > >> > >
>> > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the
>> > >> topic
>> > >> > > creations, partition creations and topics deletions that are
>> exceeding
>> > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
>> > >> > > be populated accordingly to let the client know how long it must
>> wait.
>> > >> > >
>> > >> > > 2. I do agree that we actually want a mechanism to apply back
>> pressure
>> > >> > > to the clients. The KIP basically proposes a mechanism to
>> control and
>> > >> to
>> > >> > > limit the rate of operations before entering the controller. I
>> think
>> > >> that
>> > >> > > it is similar to your thinking but is enforced based on a defined
>> > >> > > instead of relying on the number of pending items in the
>> controller.
>> > >> > >
>> > >> > > 3. You mentioned an alternative idea in your comments that, if I
>> > >> > understood
>> > >> > > correctly, would bound the queue to limit the overload and reject
>> > >> work if
>> > >> > > the queue is full. I have been thinking about this as well but I
>> don't
>> > >> > think
>> > >> > > that it  works well in our case.
>> > >> > > - The first reason is

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-15 Thread David Jacot

Hi all,

The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and
Colin)
and 2 non-binding votes (Tom, Anna).

Thank you all for the fruitful discussion! I'd like to particularly thank
Anna who has
heavily contributed to the design of this KIP.

Regards,
David

On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe  wrote:

> +1.  Thanks, David!
>
> best,
> Colin
>
> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
> > Colin, Jun,
> >
> > Do the proposed error code and the updated KIP look good to you guys? I’d
> > like to wrap up and close the vote.
> >
> > Thanks,
> > David
> >
> > Le mer. 10 juin 2020 à 14:50, David Jacot  a écrit
> :
> >
> > > Hi Colin and Jun,
> > >
> > > I have no problem if we have to rewrite part of it when the new
> controller
> > > comes
> > > out. I will be more than happy to help out.
> > >
> > > Regarding KIP-590, I think that we can cope with a principal as a
> string
> > > when the
> > > time comes. The user entity name is defined with a string already.
> > >
> > > Regarding the name of the error, you have made a good point. I do agree
> > > that it
> > > is important to differentiate the two cases. I propose the following
> two
> > > errors:
> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate
> as
> > > we have quotas which are not rate (e.g. request quota). This one is
> > > retryable
> > > once the throttle time is passed.
> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has
> been
> > > reached and is a final error.
> > > We only need the former in this KIP. What do you think?
> > >
> > > Jun, I have added a few examples in the KIP. The new name works exactly
> > > like
> > > the existing one once it is added to the accepted dynamic configs for
> the
> > > user
> > > and the client entities. I have added a "Kafka Config Command" chapter
> in
> > > the
> > > KIP. I will also open a Jira to not forget updating the AK
> documentation
> > > once
> > > the KIP gets merged.
> > >
> > > Thanks,
> > > David
> > >
> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
> > >
> > >> Hi, Colin,
> > >>
> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this
> clear.
> > >>
> > >> Hi, David,
> > >>
> > >> We added a new quota name in the KIP. You chose not to bump up the
> version
> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok
> since
> > >> the
> > >> quota name is represented as a string. However, the new quota name
> can be
> > >> used in client tools for setting and listing the quota (
> > >> https://kafka.apache.org/documentation/#quotas). Could you document
> how
> > >> the
> > >> new name will be used in those tools?
> > >>
> > >> Thanks,
> > >>
> > >> Jun
> > >>
> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe 
> wrote:
> > >>
> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> > >> > > Hi Colin,
> > >> > >
> > >> > > Thank you for your feedback.
> > >> > >
> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would
> > >> like to
> > >> > > complement it with the following points:
> > >> > >
> > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the
> > >> topic
> > >> > > creations, partition creations and topics deletions that are
> exceeding
> > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> > >> > > be populated accordingly to let the client know how long it must
> wait.
> > >> > >
> > >> > > 2. I do agree that we actually want a mechanism to apply back
> pressure
> > >> > > to the clients. The KIP basically proposes a mechanism to control
> and
> > >> to
> > >> > > limit the rate of operations before entering the controller. I
> think
> > >> that
> > >> > > it is similar to your thinking but is enforced based on a defined
> > >> > > instead of relying on the number of pending items in the
> controller.
> > >> > >
> > >> > > 3. You mentioned an alternative idea in your comments that, if I
> > >> > understood
> > >> > > correctly, would bound the queue to limit the overload and reject
> > >> work if
> > >> > > the queue is full. I have been thinking about this as well but I
> don't
> > >> > think
> > >> > > that it  works well in our case.
> > >> > > - The first reason is the one mentioned by Jun. We actually want
> to be
> > >> > able
> > >> > > to limit specific clients (the misbehaving ones) in a multi-tenant
> > >> > > environment.
> > >> > > - The second reason is that, at least in our current
> implementation,
> > >> the
> > >> > > length of the queue is not really a good characteristic to
> estimate
> > >> the
> > >> > load.
> > >> > > Coming back to your example of the CreateTopicsRequest. They
> create
> > >> path
> > >> > >  in ZK for each newly created topics which trigger a ChangeTopic
> event
> > >> > in
> > >> > > the controller. That single event could be for a single topic in
> some
> > >> > cases or
> > >> > > for a thousand topics in others.
> > >> > > These two reasons aside, bounding

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-12 Thread Colin McCabe

+1.  Thanks, David!

best,
Colin

On Thu, Jun 11, 2020, at 23:51, David Jacot wrote:
> Colin, Jun,
> 
> Do the proposed error code and the updated KIP look good to you guys? I’d
> like to wrap up and close the vote.
> 
> Thanks,
> David
> 
> Le mer. 10 juin 2020 à 14:50, David Jacot  a écrit :
> 
> > Hi Colin and Jun,
> >
> > I have no problem if we have to rewrite part of it when the new controller
> > comes
> > out. I will be more than happy to help out.
> >
> > Regarding KIP-590, I think that we can cope with a principal as a string
> > when the
> > time comes. The user entity name is defined with a string already.
> >
> > Regarding the name of the error, you have made a good point. I do agree
> > that it
> > is important to differentiate the two cases. I propose the following two
> > errors:
> > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as
> > we have quotas which are not rate (e.g. request quota). This one is
> > retryable
> > once the throttle time is passed.
> > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been
> > reached and is a final error.
> > We only need the former in this KIP. What do you think?
> >
> > Jun, I have added a few examples in the KIP. The new name works exactly
> > like
> > the existing one once it is added to the accepted dynamic configs for the
> > user
> > and the client entities. I have added a "Kafka Config Command" chapter in
> > the
> > KIP. I will also open a Jira to not forget updating the AK documentation
> > once
> > the KIP gets merged.
> >
> > Thanks,
> > David
> >
> > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
> >
> >> Hi, Colin,
> >>
> >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear.
> >>
> >> Hi, David,
> >>
> >> We added a new quota name in the KIP. You chose not to bump up the version
> >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since
> >> the
> >> quota name is represented as a string. However, the new quota name can be
> >> used in client tools for setting and listing the quota (
> >> https://kafka.apache.org/documentation/#quotas). Could you document how
> >> the
> >> new name will be used in those tools?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe  wrote:
> >>
> >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> >> > > Hi Colin,
> >> > >
> >> > > Thank you for your feedback.
> >> > >
> >> > > Jun has summarized the situation pretty well. Thanks Jun! I would
> >> like to
> >> > > complement it with the following points:
> >> > >
> >> > > 1. Indeed, when the quota is exceeded, the broker will reject the
> >> topic
> >> > > creations, partition creations and topics deletions that are exceeding
> >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> >> > > be populated accordingly to let the client know how long it must wait.
> >> > >
> >> > > 2. I do agree that we actually want a mechanism to apply back pressure
> >> > > to the clients. The KIP basically proposes a mechanism to control and
> >> to
> >> > > limit the rate of operations before entering the controller. I think
> >> that
> >> > > it is similar to your thinking but is enforced based on a defined
> >> > > instead of relying on the number of pending items in the controller.
> >> > >
> >> > > 3. You mentioned an alternative idea in your comments that, if I
> >> > understood
> >> > > correctly, would bound the queue to limit the overload and reject
> >> work if
> >> > > the queue is full. I have been thinking about this as well but I don't
> >> > think
> >> > > that it  works well in our case.
> >> > > - The first reason is the one mentioned by Jun. We actually want to be
> >> > able
> >> > > to limit specific clients (the misbehaving ones) in a multi-tenant
> >> > > environment.
> >> > > - The second reason is that, at least in our current implementation,
> >> the
> >> > > length of the queue is not really a good characteristic to estimate
> >> the
> >> > load.
> >> > > Coming back to your example of the CreateTopicsRequest. They create
> >> path
> >> > >  in ZK for each newly created topics which trigger a ChangeTopic event
> >> > in
> >> > > the controller. That single event could be for a single topic in some
> >> > cases or
> >> > > for a thousand topics in others.
> >> > > These two reasons aside, bounding the queue also introduces a knob
> >> which
> >> > > requires some tuning and thus suffers from all the points you
> >> mentioned
> >> > as
> >> > > well, isn't it? The value may be true for one version but not for
> >> > another.
> >> > >
> >> > > 4. Regarding the integration with KIP-500. The implementation of this
> >> KIP
> >> > > will span across the ApiLayer and the AdminManager. To be honest, we
> >> > > can influence the implementation to work best with what you have in
> >> mind
> >> > > for the future controller. If you could shed some more light on the
> >> > future
> >> > > internal architecture, I can

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-12 Thread Jun Rao

Hi, David,

Thanks for making those changes. They look fine to me. +1

Jun

On Thu, Jun 11, 2020 at 11:51 PM David Jacot  wrote:

> Colin, Jun,
>
> Do the proposed error code and the updated KIP look good to you guys? I’d
> like to wrap up and close the vote.
>
> Thanks,
> David
>
> Le mer. 10 juin 2020 à 14:50, David Jacot  a écrit :
>
> > Hi Colin and Jun,
> >
> > I have no problem if we have to rewrite part of it when the new
> controller
> > comes
> > out. I will be more than happy to help out.
> >
> > Regarding KIP-590, I think that we can cope with a principal as a string
> > when the
> > time comes. The user entity name is defined with a string already.
> >
> > Regarding the name of the error, you have made a good point. I do agree
> > that it
> > is important to differentiate the two cases. I propose the following two
> > errors:
> > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as
> > we have quotas which are not rate (e.g. request quota). This one is
> > retryable
> > once the throttle time is passed.
> > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been
> > reached and is a final error.
> > We only need the former in this KIP. What do you think?
> >
> > Jun, I have added a few examples in the KIP. The new name works exactly
> > like
> > the existing one once it is added to the accepted dynamic configs for the
> > user
> > and the client entities. I have added a "Kafka Config Command" chapter in
> > the
> > KIP. I will also open a Jira to not forget updating the AK documentation
> > once
> > the KIP gets merged.
> >
> > Thanks,
> > David
> >
> > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
> >
> >> Hi, Colin,
> >>
> >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this
> clear.
> >>
> >> Hi, David,
> >>
> >> We added a new quota name in the KIP. You chose not to bump up the
> version
> >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since
> >> the
> >> quota name is represented as a string. However, the new quota name can
> be
> >> used in client tools for setting and listing the quota (
> >> https://kafka.apache.org/documentation/#quotas). Could you document how
> >> the
> >> new name will be used in those tools?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe  wrote:
> >>
> >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> >> > > Hi Colin,
> >> > >
> >> > > Thank you for your feedback.
> >> > >
> >> > > Jun has summarized the situation pretty well. Thanks Jun! I would
> >> like to
> >> > > complement it with the following points:
> >> > >
> >> > > 1. Indeed, when the quota is exceeded, the broker will reject the
> >> topic
> >> > > creations, partition creations and topics deletions that are
> exceeding
> >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> >> > > be populated accordingly to let the client know how long it must
> wait.
> >> > >
> >> > > 2. I do agree that we actually want a mechanism to apply back
> pressure
> >> > > to the clients. The KIP basically proposes a mechanism to control
> and
> >> to
> >> > > limit the rate of operations before entering the controller. I think
> >> that
> >> > > it is similar to your thinking but is enforced based on a defined
> >> > > instead of relying on the number of pending items in the controller.
> >> > >
> >> > > 3. You mentioned an alternative idea in your comments that, if I
> >> > understood
> >> > > correctly, would bound the queue to limit the overload and reject
> >> work if
> >> > > the queue is full. I have been thinking about this as well but I
> don't
> >> > think
> >> > > that it  works well in our case.
> >> > > - The first reason is the one mentioned by Jun. We actually want to
> be
> >> > able
> >> > > to limit specific clients (the misbehaving ones) in a multi-tenant
> >> > > environment.
> >> > > - The second reason is that, at least in our current implementation,
> >> the
> >> > > length of the queue is not really a good characteristic to estimate
> >> the
> >> > load.
> >> > > Coming back to your example of the CreateTopicsRequest. They create
> >> path
> >> > >  in ZK for each newly created topics which trigger a ChangeTopic
> event
> >> > in
> >> > > the controller. That single event could be for a single topic in
> some
> >> > cases or
> >> > > for a thousand topics in others.
> >> > > These two reasons aside, bounding the queue also introduces a knob
> >> which
> >> > > requires some tuning and thus suffers from all the points you
> >> mentioned
> >> > as
> >> > > well, isn't it? The value may be true for one version but not for
> >> > another.
> >> > >
> >> > > 4. Regarding the integration with KIP-500. The implementation of
> this
> >> KIP
> >> > > will span across the ApiLayer and the AdminManager. To be honest, we
> >> > > can influence the implementation to work best with what you have in
> >> mind
> >> > > for the future controller. If you could shed some more

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-12 Thread David Jacot

Colin, Jun,

Do the proposed error code and the updated KIP look good to you guys? I’d
like to wrap up and close the vote.

Thanks,
David

Le mer. 10 juin 2020 à 14:50, David Jacot  a écrit :

> Hi Colin and Jun,
>
> I have no problem if we have to rewrite part of it when the new controller
> comes
> out. I will be more than happy to help out.
>
> Regarding KIP-590, I think that we can cope with a principal as a string
> when the
> time comes. The user entity name is defined with a string already.
>
> Regarding the name of the error, you have made a good point. I do agree
> that it
> is important to differentiate the two cases. I propose the following two
> errors:
> - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as
> we have quotas which are not rate (e.g. request quota). This one is
> retryable
> once the throttle time is passed.
> - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been
> reached and is a final error.
> We only need the former in this KIP. What do you think?
>
> Jun, I have added a few examples in the KIP. The new name works exactly
> like
> the existing one once it is added to the accepted dynamic configs for the
> user
> and the client entities. I have added a "Kafka Config Command" chapter in
> the
> KIP. I will also open a Jira to not forget updating the AK documentation
> once
> the KIP gets merged.
>
> Thanks,
> David
>
> On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:
>
>> Hi, Colin,
>>
>> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear.
>>
>> Hi, David,
>>
>> We added a new quota name in the KIP. You chose not to bump up the version
>> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since
>> the
>> quota name is represented as a string. However, the new quota name can be
>> used in client tools for setting and listing the quota (
>> https://kafka.apache.org/documentation/#quotas). Could you document how
>> the
>> new name will be used in those tools?
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe  wrote:
>>
>> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
>> > > Hi Colin,
>> > >
>> > > Thank you for your feedback.
>> > >
>> > > Jun has summarized the situation pretty well. Thanks Jun! I would
>> like to
>> > > complement it with the following points:
>> > >
>> > > 1. Indeed, when the quota is exceeded, the broker will reject the
>> topic
>> > > creations, partition creations and topics deletions that are exceeding
>> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
>> > > be populated accordingly to let the client know how long it must wait.
>> > >
>> > > 2. I do agree that we actually want a mechanism to apply back pressure
>> > > to the clients. The KIP basically proposes a mechanism to control and
>> to
>> > > limit the rate of operations before entering the controller. I think
>> that
>> > > it is similar to your thinking but is enforced based on a defined
>> > > instead of relying on the number of pending items in the controller.
>> > >
>> > > 3. You mentioned an alternative idea in your comments that, if I
>> > understood
>> > > correctly, would bound the queue to limit the overload and reject
>> work if
>> > > the queue is full. I have been thinking about this as well but I don't
>> > think
>> > > that it  works well in our case.
>> > > - The first reason is the one mentioned by Jun. We actually want to be
>> > able
>> > > to limit specific clients (the misbehaving ones) in a multi-tenant
>> > > environment.
>> > > - The second reason is that, at least in our current implementation,
>> the
>> > > length of the queue is not really a good characteristic to estimate
>> the
>> > load.
>> > > Coming back to your example of the CreateTopicsRequest. They create
>> path
>> > >  in ZK for each newly created topics which trigger a ChangeTopic event
>> > in
>> > > the controller. That single event could be for a single topic in some
>> > cases or
>> > > for a thousand topics in others.
>> > > These two reasons aside, bounding the queue also introduces a knob
>> which
>> > > requires some tuning and thus suffers from all the points you
>> mentioned
>> > as
>> > > well, isn't it? The value may be true for one version but not for
>> > another.
>> > >
>> > > 4. Regarding the integration with KIP-500. The implementation of this
>> KIP
>> > > will span across the ApiLayer and the AdminManager. To be honest, we
>> > > can influence the implementation to work best with what you have in
>> mind
>> > > for the future controller. If you could shed some more light on the
>> > future
>> > > internal architecture, I can take this into account during the
>> > > implementation.
>> > >
>> >
>> > Hi David,
>> >
>> > Good question.  The approach we are thinking of is that in the future,
>> > topic creation will be a controller RPC.  In other words, rather than
>> > changing ZK and waiting for the controller code to notice, we'll go
>> through
>> > the controller code

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-10 Thread David Jacot

Hi Colin and Jun,

I have no problem if we have to rewrite part of it when the new controller
comes
out. I will be more than happy to help out.

Regarding KIP-590, I think that we can cope with a principal as a string
when the
time comes. The user entity name is defined with a string already.

Regarding the name of the error, you have made a good point. I do agree
that it
is important to differentiate the two cases. I propose the following two
errors:
- THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as
we have quotas which are not rate (e.g. request quota). This one is
retryable
once the throttle time is passed.
- LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been
reached and is a final error.
We only need the former in this KIP. What do you think?

Jun, I have added a few examples in the KIP. The new name works exactly like
the existing one once it is added to the accepted dynamic configs for the
user
and the client entities. I have added a "Kafka Config Command" chapter in
the
KIP. I will also open a Jira to not forget updating the AK documentation
once
the KIP gets merged.

Thanks,
David

On Wed, Jun 10, 2020 at 3:03 AM Jun Rao  wrote:

> Hi, Colin,
>
> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear.
>
> Hi, David,
>
> We added a new quota name in the KIP. You chose not to bump up the version
> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since the
> quota name is represented as a string. However, the new quota name can be
> used in client tools for setting and listing the quota (
> https://kafka.apache.org/documentation/#quotas). Could you document how
> the
> new name will be used in those tools?
>
> Thanks,
>
> Jun
>
> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe  wrote:
>
> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> > > Hi Colin,
> > >
> > > Thank you for your feedback.
> > >
> > > Jun has summarized the situation pretty well. Thanks Jun! I would like
> to
> > > complement it with the following points:
> > >
> > > 1. Indeed, when the quota is exceeded, the broker will reject the topic
> > > creations, partition creations and topics deletions that are exceeding
> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> > > be populated accordingly to let the client know how long it must wait.
> > >
> > > 2. I do agree that we actually want a mechanism to apply back pressure
> > > to the clients. The KIP basically proposes a mechanism to control and
> to
> > > limit the rate of operations before entering the controller. I think
> that
> > > it is similar to your thinking but is enforced based on a defined
> > > instead of relying on the number of pending items in the controller.
> > >
> > > 3. You mentioned an alternative idea in your comments that, if I
> > understood
> > > correctly, would bound the queue to limit the overload and reject work
> if
> > > the queue is full. I have been thinking about this as well but I don't
> > think
> > > that it  works well in our case.
> > > - The first reason is the one mentioned by Jun. We actually want to be
> > able
> > > to limit specific clients (the misbehaving ones) in a multi-tenant
> > > environment.
> > > - The second reason is that, at least in our current implementation,
> the
> > > length of the queue is not really a good characteristic to estimate the
> > load.
> > > Coming back to your example of the CreateTopicsRequest. They create
> path
> > >  in ZK for each newly created topics which trigger a ChangeTopic event
> > in
> > > the controller. That single event could be for a single topic in some
> > cases or
> > > for a thousand topics in others.
> > > These two reasons aside, bounding the queue also introduces a knob
> which
> > > requires some tuning and thus suffers from all the points you mentioned
> > as
> > > well, isn't it? The value may be true for one version but not for
> > another.
> > >
> > > 4. Regarding the integration with KIP-500. The implementation of this
> KIP
> > > will span across the ApiLayer and the AdminManager. To be honest, we
> > > can influence the implementation to work best with what you have in
> mind
> > > for the future controller. If you could shed some more light on the
> > future
> > > internal architecture, I can take this into account during the
> > > implementation.
> > >
> >
> > Hi David,
> >
> > Good question.  The approach we are thinking of is that in the future,
> > topic creation will be a controller RPC.  In other words, rather than
> > changing ZK and waiting for the controller code to notice, we'll go
> through
> > the controller code (which may change ZK, or may do something else in the
> > ZK-free environment).
> >
> > I don't think there is a good way to write this in the short term that
> > avoids having to rewrite in the long term.  That's probably OK though, as
> > long as we keep in mind that we'll need to.
> >
> > >
> > > 5. Regarding KIP-590. For the create topics request, create

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Jun Rao

Hi, Colin,

Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear.

Hi, David,

We added a new quota name in the KIP. You chose not to bump up the version
of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since the
quota name is represented as a string. However, the new quota name can be
used in client tools for setting and listing the quota (
https://kafka.apache.org/documentation/#quotas). Could you document how the
new name will be used in those tools?

Thanks,

Jun

On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe  wrote:

> On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> > Hi Colin,
> >
> > Thank you for your feedback.
> >
> > Jun has summarized the situation pretty well. Thanks Jun! I would like to
> > complement it with the following points:
> >
> > 1. Indeed, when the quota is exceeded, the broker will reject the topic
> > creations, partition creations and topics deletions that are exceeding
> > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> > be populated accordingly to let the client know how long it must wait.
> >
> > 2. I do agree that we actually want a mechanism to apply back pressure
> > to the clients. The KIP basically proposes a mechanism to control and to
> > limit the rate of operations before entering the controller. I think that
> > it is similar to your thinking but is enforced based on a defined
> > instead of relying on the number of pending items in the controller.
> >
> > 3. You mentioned an alternative idea in your comments that, if I
> understood
> > correctly, would bound the queue to limit the overload and reject work if
> > the queue is full. I have been thinking about this as well but I don't
> think
> > that it  works well in our case.
> > - The first reason is the one mentioned by Jun. We actually want to be
> able
> > to limit specific clients (the misbehaving ones) in a multi-tenant
> > environment.
> > - The second reason is that, at least in our current implementation, the
> > length of the queue is not really a good characteristic to estimate the
> load.
> > Coming back to your example of the CreateTopicsRequest. They create path
> >  in ZK for each newly created topics which trigger a ChangeTopic event
> in
> > the controller. That single event could be for a single topic in some
> cases or
> > for a thousand topics in others.
> > These two reasons aside, bounding the queue also introduces a knob which
> > requires some tuning and thus suffers from all the points you mentioned
> as
> > well, isn't it? The value may be true for one version but not for
> another.
> >
> > 4. Regarding the integration with KIP-500. The implementation of this KIP
> > will span across the ApiLayer and the AdminManager. To be honest, we
> > can influence the implementation to work best with what you have in mind
> > for the future controller. If you could shed some more light on the
> future
> > internal architecture, I can take this into account during the
> > implementation.
> >
>
> Hi David,
>
> Good question.  The approach we are thinking of is that in the future,
> topic creation will be a controller RPC.  In other words, rather than
> changing ZK and waiting for the controller code to notice, we'll go through
> the controller code (which may change ZK, or may do something else in the
> ZK-free environment).
>
> I don't think there is a good way to write this in the short term that
> avoids having to rewrite in the long term.  That's probably OK though, as
> long as we keep in mind that we'll need to.
>
> >
> > 5. Regarding KIP-590. For the create topics request, create partitions
> > request, and delete topics request, we are lucky enough to have directed
> > all of them to the controller already so we are fine with doing the
> admission
> > in the broker which hosts the controller. If we were to throttle more
> operations
> > in the future,
> > I believe that we can continue to do it centrally while leveraging the
> > principal that is forwarded to account for the right tenant. The reason
> > why I would like to keep it central is that we are talking about sparse
> (or bursty)
> > workloads here so distributing the quota among all the brokers makes
> little sense.
> >
>
> Right.  The main requirement here is that the quota must operate based on
> principal names rather than KafkaPrincipal objects.  We had a long
> discussion about KIP-590 and eventually concluded that we didn't want to
> make KafkaPrincipal serializable (at least not yet) so what would be
> forwarded is just a string, not the principal object.
>
> >
> > 6. Regarding the naming of the new error code. BUSY sounds too generic to
> > me so I would rather prefer a specific error code. The main reason being
> > that we may be able to reuse it in the future for other quotas. That
> being said,
> > we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly
> about
> > the naming. I wonder what the others think about this.
> >
>
> Think about if you're

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Colin McCabe

On Tue, Jun 9, 2020, at 05:06, David Jacot wrote:
> Hi Colin,
> 
> Thank you for your feedback.
> 
> Jun has summarized the situation pretty well. Thanks Jun! I would like to
> complement it with the following points:
> 
> 1. Indeed, when the quota is exceeded, the broker will reject the topic
> creations, partition creations and topics deletions that are exceeding
> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> be populated accordingly to let the client know how long it must wait.
> 
> 2. I do agree that we actually want a mechanism to apply back pressure
> to the clients. The KIP basically proposes a mechanism to control and to
> limit the rate of operations before entering the controller. I think that
> it is similar to your thinking but is enforced based on a defined
> instead of relying on the number of pending items in the controller.
> 
> 3. You mentioned an alternative idea in your comments that, if I understood
> correctly, would bound the queue to limit the overload and reject work if
> the queue is full. I have been thinking about this as well but I don't think
> that it  works well in our case.
> - The first reason is the one mentioned by Jun. We actually want to be able
> to limit specific clients (the misbehaving ones) in a multi-tenant
> environment.
> - The second reason is that, at least in our current implementation, the
> length of the queue is not really a good characteristic to estimate the load.
> Coming back to your example of the CreateTopicsRequest. They create path
>  in ZK for each newly created topics which trigger a ChangeTopic event in 
> the controller. That single event could be for a single topic in some cases or
> for a thousand topics in others.
> These two reasons aside, bounding the queue also introduces a knob which
> requires some tuning and thus suffers from all the points you mentioned as
> well, isn't it? The value may be true for one version but not for another.
> 
> 4. Regarding the integration with KIP-500. The implementation of this KIP
> will span across the ApiLayer and the AdminManager. To be honest, we
> can influence the implementation to work best with what you have in mind
> for the future controller. If you could shed some more light on the future
> internal architecture, I can take this into account during the
> implementation.
>

Hi David,

Good question.  The approach we are thinking of is that in the future, topic 
creation will be a controller RPC.  In other words, rather than changing ZK and 
waiting for the controller code to notice, we'll go through the controller code 
(which may change ZK, or may do something else in the ZK-free environment).

I don't think there is a good way to write this in the short term that avoids 
having to rewrite in the long term.  That's probably OK though, as long as we 
keep in mind that we'll need to.

> 
> 5. Regarding KIP-590. For the create topics request, create partitions
> request, and delete topics request, we are lucky enough to have directed
> all of them to the controller already so we are fine with doing the admission
> in the broker which hosts the controller. If we were to throttle more 
> operations
> in the future,
> I believe that we can continue to do it centrally while leveraging the
> principal that is forwarded to account for the right tenant. The reason 
> why I would like to keep it central is that we are talking about sparse (or 
> bursty)
> workloads here so distributing the quota among all the brokers makes little 
> sense.
> 

Right.  The main requirement here is that the quota must operate based on 
principal names rather than KafkaPrincipal objects.  We had a long discussion 
about KIP-590 and eventually concluded that we didn't want to make 
KafkaPrincipal serializable (at least not yet) so what would be forwarded is 
just a string, not the principal object.

>
> 6. Regarding the naming of the new error code. BUSY sounds too generic to
> me so I would rather prefer a specific error code. The main reason being
> that we may be able to reuse it in the future for other quotas. That being 
> said,
> we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about
> the naming. I wonder what the others think about this.
> 

Think about if you're someone writing software that uses the Kafka client.  
Let's say you try to create some topics and get back QUOTA_VIOLATED.  What does 
it mean?  Maybe it means that you tried to create too many topics in too short 
a time (violating the controller throttle).  Or maybe it means that you 
exceeded your quota specifying the number of partitions that they are allowed 
to create (let's assume that someone did a follow-on KIP for that that reuses 
this error code for that.)

But now you have a dilemma.  If the controller was just busy, then trying again 
is the right thing to do.  If there is some other quota you violated (number of 
partitions, number of topics, etc.) then retrying is wasteful and will not 
accomplish

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Jun Rao

Hi, David,

Sounds good then.

Thanks,

Jun

On Tue, Jun 9, 2020 at 10:59 AM David Jacot  wrote:

> Hi Jun,
>
> Both are already in the KIP, see "New Broker Configurations" chapter. I
> think
> that we need them in order to be able to define different burst for the new
> quota.
>
> Best,
> David
>
> On Tue, Jun 9, 2020 at 7:48 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Another thing. Should we add controller.quota.window.size.seconds and
> > controller.quota.window.num
> > or just reuse the existing quota.window.size.seconds and quota.window.num
> > that are used for other types of quotas?
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jun 9, 2020 at 10:30 AM Jun Rao  wrote:
> >
> > > Hi, David,
> > >
> > > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1
> > on
> > > the KIP.
> > >
> > > Jun
> > >
> > > On Tue, Jun 9, 2020 at 5:07 AM David Jacot 
> wrote:
> > >
> > >> Hi Colin,
> > >>
> > >> Thank you for your feedback.
> > >>
> > >> Jun has summarized the situation pretty well. Thanks Jun! I would like
> > to
> > >> complement it with the following points:
> > >>
> > >> 1. Indeed, when the quota is exceeded, the broker will reject the
> topic
> > >> creations, partition creations and topics deletions that are exceeding
> > >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> > >> be populated accordingly to let the client know how long it must wait.
> > >>
> > >> 2. I do agree that we actually want a mechanism to apply back pressure
> > >> to the clients. The KIP basically proposes a mechanism to control and
> to
> > >> limit the rate of operations before entering the controller. I think
> > that
> > >> it is
> > >> similar to your thinking but is enforced based on a defined quota
> > instead
> > >> of relying on the number of pending items in the controller.
> > >>
> > >> 3. You mentioned an alternative idea in your comments that, if I
> > >> understood
> > >> correctly, would bound the queue to limit the overload and reject work
> > if
> > >> the
> > >> queue is full. I have been thinking about this as well but I don't
> think
> > >> that it
> > >> works well in our case.
> > >> - The first reason is the one mentioned by Jun. We actually want to be
> > >> able
> > >> to limit specific clients (the misbehaving ones) in a multi-tenant
> > >> environment.
> > >> - The second reason is that, at least in our current implementation,
> the
> > >> length of
> > >> the queue is not really a good characteristic to estimate the load.
> > >> Coming back
> > >> to your example of the CreateTopicsRequest. They create path in ZK for
> > >> each
> > >> newly created topics which trigger a ChangeTopic event in the
> > controller.
> > >> That
> > >> single event could be for a single topic in some cases or for a
> thousand
> > >> topics
> > >> in others.
> > >> These two reasons aside, bounding the queue also introduces a knob
> which
> > >> requires some tuning and thus suffers from all the points you
> mentioned
> > as
> > >> well, isn't it? The value may be true for one version but not for
> > another.
> > >>
> > >> 4. Regarding the integration with KIP-500. The implementation of this
> > KIP
> > >> will span across the ApiLayer and the AdminManager. To be honest, we
> > >> can influence the implementation to work best with what you have in
> mind
> > >> for the future controller. If you could shed some more light on the
> > future
> > >> internal architecture, I can take this into account during the
> > >> implementation.
> > >>
> > >> 5. Regarding KIP-590. For the create topics request, create partitions
> > >> request,
> > >> and delete topics request, we are lucky enough to have directed all of
> > >> them
> > >> to
> > >> the controller already so we are fine with doing the admission in the
> > >> broker
> > >> which hosts the controller. If we were to throttle more operations in
> > the
> > >> future,
> > >> I believe that we can continue to do it centrally while leveraging the
> > >> principal
> > >> that is forwarded to account for the right tenant. The reason why I
> > would
> > >> like
> > >> to keep it central is that we are talking about sparse (or bursty)
> > >> workloads here
> > >> so distributing the quota among all the brokers makes little sense.
> > >>
> > >> 6. Regarding the naming of the new error code. BUSY sounds too generic
> > to
> > >> me so I would rather prefer a specific error code. The main reason
> being
> > >> that
> > >> we may be able to reuse it in the future for other quotas. That being
> > >> said,
> > >> we
> > >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly
> about
> > >> the naming. I wonder what the others think about this.
> > >>
> > >> Voilà. I hope that I have addressed all your questions/points and I am
> > >> sorry for
> > >> the lengthy email.
> > >>
> > >> Regards,
> > >> David
> > >>
> > >>
> > >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe 
> wrote:
> > >>
> > >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread David Jacot

Hi Jun,

Both are already in the KIP, see "New Broker Configurations" chapter. I
think
that we need them in order to be able to define different burst for the new
quota.

Best,
David

On Tue, Jun 9, 2020 at 7:48 PM Jun Rao  wrote:

> Hi, David,
>
> Another thing. Should we add controller.quota.window.size.seconds and
> controller.quota.window.num
> or just reuse the existing quota.window.size.seconds and quota.window.num
> that are used for other types of quotas?
>
> Thanks,
>
> Jun
>
> On Tue, Jun 9, 2020 at 10:30 AM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1
> on
> > the KIP.
> >
> > Jun
> >
> > On Tue, Jun 9, 2020 at 5:07 AM David Jacot  wrote:
> >
> >> Hi Colin,
> >>
> >> Thank you for your feedback.
> >>
> >> Jun has summarized the situation pretty well. Thanks Jun! I would like
> to
> >> complement it with the following points:
> >>
> >> 1. Indeed, when the quota is exceeded, the broker will reject the topic
> >> creations, partition creations and topics deletions that are exceeding
> >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> >> be populated accordingly to let the client know how long it must wait.
> >>
> >> 2. I do agree that we actually want a mechanism to apply back pressure
> >> to the clients. The KIP basically proposes a mechanism to control and to
> >> limit the rate of operations before entering the controller. I think
> that
> >> it is
> >> similar to your thinking but is enforced based on a defined quota
> instead
> >> of relying on the number of pending items in the controller.
> >>
> >> 3. You mentioned an alternative idea in your comments that, if I
> >> understood
> >> correctly, would bound the queue to limit the overload and reject work
> if
> >> the
> >> queue is full. I have been thinking about this as well but I don't think
> >> that it
> >> works well in our case.
> >> - The first reason is the one mentioned by Jun. We actually want to be
> >> able
> >> to limit specific clients (the misbehaving ones) in a multi-tenant
> >> environment.
> >> - The second reason is that, at least in our current implementation, the
> >> length of
> >> the queue is not really a good characteristic to estimate the load.
> >> Coming back
> >> to your example of the CreateTopicsRequest. They create path in ZK for
> >> each
> >> newly created topics which trigger a ChangeTopic event in the
> controller.
> >> That
> >> single event could be for a single topic in some cases or for a thousand
> >> topics
> >> in others.
> >> These two reasons aside, bounding the queue also introduces a knob which
> >> requires some tuning and thus suffers from all the points you mentioned
> as
> >> well, isn't it? The value may be true for one version but not for
> another.
> >>
> >> 4. Regarding the integration with KIP-500. The implementation of this
> KIP
> >> will span across the ApiLayer and the AdminManager. To be honest, we
> >> can influence the implementation to work best with what you have in mind
> >> for the future controller. If you could shed some more light on the
> future
> >> internal architecture, I can take this into account during the
> >> implementation.
> >>
> >> 5. Regarding KIP-590. For the create topics request, create partitions
> >> request,
> >> and delete topics request, we are lucky enough to have directed all of
> >> them
> >> to
> >> the controller already so we are fine with doing the admission in the
> >> broker
> >> which hosts the controller. If we were to throttle more operations in
> the
> >> future,
> >> I believe that we can continue to do it centrally while leveraging the
> >> principal
> >> that is forwarded to account for the right tenant. The reason why I
> would
> >> like
> >> to keep it central is that we are talking about sparse (or bursty)
> >> workloads here
> >> so distributing the quota among all the brokers makes little sense.
> >>
> >> 6. Regarding the naming of the new error code. BUSY sounds too generic
> to
> >> me so I would rather prefer a specific error code. The main reason being
> >> that
> >> we may be able to reuse it in the future for other quotas. That being
> >> said,
> >> we
> >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about
> >> the naming. I wonder what the others think about this.
> >>
> >> Voilà. I hope that I have addressed all your questions/points and I am
> >> sorry for
> >> the lengthy email.
> >>
> >> Regards,
> >> David
> >>
> >>
> >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe  wrote:
> >>
> >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> >> > > Hi, Colin,
> >> > >
> >> > > Thanks for the comment. You brought up several points.
> >> > >
> >> > > 1. Should we set up a per user quota? To me, it does seem we need
> some
> >> > sort
> >> > > of a quota. When the controller runs out of resources, ideally, we
> >> only
> >> > > want to penalize the bad behaving applications, instead of every
> >> > > application. To do that,

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Jun Rao

Hi, David,

Another thing. Should we add controller.quota.window.size.seconds and
controller.quota.window.num
or just reuse the existing quota.window.size.seconds and quota.window.num
that are used for other types of quotas?

Thanks,

Jun

On Tue, Jun 9, 2020 at 10:30 AM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on
> the KIP.
>
> Jun
>
> On Tue, Jun 9, 2020 at 5:07 AM David Jacot  wrote:
>
>> Hi Colin,
>>
>> Thank you for your feedback.
>>
>> Jun has summarized the situation pretty well. Thanks Jun! I would like to
>> complement it with the following points:
>>
>> 1. Indeed, when the quota is exceeded, the broker will reject the topic
>> creations, partition creations and topics deletions that are exceeding
>> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
>> be populated accordingly to let the client know how long it must wait.
>>
>> 2. I do agree that we actually want a mechanism to apply back pressure
>> to the clients. The KIP basically proposes a mechanism to control and to
>> limit the rate of operations before entering the controller. I think that
>> it is
>> similar to your thinking but is enforced based on a defined quota instead
>> of relying on the number of pending items in the controller.
>>
>> 3. You mentioned an alternative idea in your comments that, if I
>> understood
>> correctly, would bound the queue to limit the overload and reject work if
>> the
>> queue is full. I have been thinking about this as well but I don't think
>> that it
>> works well in our case.
>> - The first reason is the one mentioned by Jun. We actually want to be
>> able
>> to limit specific clients (the misbehaving ones) in a multi-tenant
>> environment.
>> - The second reason is that, at least in our current implementation, the
>> length of
>> the queue is not really a good characteristic to estimate the load.
>> Coming back
>> to your example of the CreateTopicsRequest. They create path in ZK for
>> each
>> newly created topics which trigger a ChangeTopic event in the controller.
>> That
>> single event could be for a single topic in some cases or for a thousand
>> topics
>> in others.
>> These two reasons aside, bounding the queue also introduces a knob which
>> requires some tuning and thus suffers from all the points you mentioned as
>> well, isn't it? The value may be true for one version but not for another.
>>
>> 4. Regarding the integration with KIP-500. The implementation of this KIP
>> will span across the ApiLayer and the AdminManager. To be honest, we
>> can influence the implementation to work best with what you have in mind
>> for the future controller. If you could shed some more light on the future
>> internal architecture, I can take this into account during the
>> implementation.
>>
>> 5. Regarding KIP-590. For the create topics request, create partitions
>> request,
>> and delete topics request, we are lucky enough to have directed all of
>> them
>> to
>> the controller already so we are fine with doing the admission in the
>> broker
>> which hosts the controller. If we were to throttle more operations in the
>> future,
>> I believe that we can continue to do it centrally while leveraging the
>> principal
>> that is forwarded to account for the right tenant. The reason why I would
>> like
>> to keep it central is that we are talking about sparse (or bursty)
>> workloads here
>> so distributing the quota among all the brokers makes little sense.
>>
>> 6. Regarding the naming of the new error code. BUSY sounds too generic to
>> me so I would rather prefer a specific error code. The main reason being
>> that
>> we may be able to reuse it in the future for other quotas. That being
>> said,
>> we
>> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about
>> the naming. I wonder what the others think about this.
>>
>> Voilà. I hope that I have addressed all your questions/points and I am
>> sorry for
>> the lengthy email.
>>
>> Regards,
>> David
>>
>>
>> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe  wrote:
>>
>> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
>> > > Hi, Colin,
>> > >
>> > > Thanks for the comment. You brought up several points.
>> > >
>> > > 1. Should we set up a per user quota? To me, it does seem we need some
>> > sort
>> > > of a quota. When the controller runs out of resources, ideally, we
>> only
>> > > want to penalize the bad behaving applications, instead of every
>> > > application. To do that, we will need to know what each application is
>> > > entitled to and the per user quota is intended to capture that.
>> > >
>> > > 2. How easy is it to configure a quota? The following is how an admin
>> > > typically sets up a quota in our existing systems. Pick a generous
>> > default
>> > > per user quota works for most applications. For the few resource
>> > intensive
>> > > applications, customize a higher quota for them. Reserve enough
>> resources
>> > > in anticipation that a single

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Jun Rao

Hi, David,

Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on
the KIP.

Jun

On Tue, Jun 9, 2020 at 5:07 AM David Jacot  wrote:

> Hi Colin,
>
> Thank you for your feedback.
>
> Jun has summarized the situation pretty well. Thanks Jun! I would like to
> complement it with the following points:
>
> 1. Indeed, when the quota is exceeded, the broker will reject the topic
> creations, partition creations and topics deletions that are exceeding
> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
> be populated accordingly to let the client know how long it must wait.
>
> 2. I do agree that we actually want a mechanism to apply back pressure
> to the clients. The KIP basically proposes a mechanism to control and to
> limit the rate of operations before entering the controller. I think that
> it is
> similar to your thinking but is enforced based on a defined quota instead
> of relying on the number of pending items in the controller.
>
> 3. You mentioned an alternative idea in your comments that, if I understood
> correctly, would bound the queue to limit the overload and reject work if
> the
> queue is full. I have been thinking about this as well but I don't think
> that it
> works well in our case.
> - The first reason is the one mentioned by Jun. We actually want to be able
> to limit specific clients (the misbehaving ones) in a multi-tenant
> environment.
> - The second reason is that, at least in our current implementation, the
> length of
> the queue is not really a good characteristic to estimate the load.
> Coming back
> to your example of the CreateTopicsRequest. They create path in ZK for each
> newly created topics which trigger a ChangeTopic event in the controller.
> That
> single event could be for a single topic in some cases or for a thousand
> topics
> in others.
> These two reasons aside, bounding the queue also introduces a knob which
> requires some tuning and thus suffers from all the points you mentioned as
> well, isn't it? The value may be true for one version but not for another.
>
> 4. Regarding the integration with KIP-500. The implementation of this KIP
> will span across the ApiLayer and the AdminManager. To be honest, we
> can influence the implementation to work best with what you have in mind
> for the future controller. If you could shed some more light on the future
> internal architecture, I can take this into account during the
> implementation.
>
> 5. Regarding KIP-590. For the create topics request, create partitions
> request,
> and delete topics request, we are lucky enough to have directed all of them
> to
> the controller already so we are fine with doing the admission in the
> broker
> which hosts the controller. If we were to throttle more operations in the
> future,
> I believe that we can continue to do it centrally while leveraging the
> principal
> that is forwarded to account for the right tenant. The reason why I would
> like
> to keep it central is that we are talking about sparse (or bursty)
> workloads here
> so distributing the quota among all the brokers makes little sense.
>
> 6. Regarding the naming of the new error code. BUSY sounds too generic to
> me so I would rather prefer a specific error code. The main reason being
> that
> we may be able to reuse it in the future for other quotas. That being said,
> we
> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about
> the naming. I wonder what the others think about this.
>
> Voilà. I hope that I have addressed all your questions/points and I am
> sorry for
> the lengthy email.
>
> Regards,
> David
>
>
> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe  wrote:
>
> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the comment. You brought up several points.
> > >
> > > 1. Should we set up a per user quota? To me, it does seem we need some
> > sort
> > > of a quota. When the controller runs out of resources, ideally, we only
> > > want to penalize the bad behaving applications, instead of every
> > > application. To do that, we will need to know what each application is
> > > entitled to and the per user quota is intended to capture that.
> > >
> > > 2. How easy is it to configure a quota? The following is how an admin
> > > typically sets up a quota in our existing systems. Pick a generous
> > default
> > > per user quota works for most applications. For the few resource
> > intensive
> > > applications, customize a higher quota for them. Reserve enough
> resources
> > > in anticipation that a single (or a few) application will exceed the
> > quota
> > > at a given time.
> > >
> >
> > Hi Jun,
> >
> > Thanks for the response.
> >
> > Maybe I was too pessimistic about the ability of admins to configure a
> > useful quota here.  I do agree that it would be nice to have the ability
> to
> > set different quotas for different users, as you mentioned.
> >
> > >
> > > 3. How should the quota be defined? In the discussion

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread Jun Rao

Hi, David,

Thanks for the explanation. The KIP looks good to me now.

Jun

On Tue, Jun 9, 2020 at 4:27 AM David Jacot  wrote:

> Hi Jun,
>
> 40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated.
> This
> is required to let the client know how long it must wait. This is explained
> in the
> "Handling of new/old clients".
>
> Best,
> David
>
> On Mon, Jun 8, 2020 at 9:29 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the updated KIP. Another minor comment below.
> >
> > 40. For the new `QUOTA_VIOLATED` error in the response to
> > CreateTopics/CreatePartitions/DeleteTopics, could you clarify
> > whether ThrottleTimeMs is set when the error code is set to
> QUOTA_VIOLATED?
> >
> > Jun
> >
> > On Mon, Jun 8, 2020 at 9:32 AM David Jacot  wrote:
> >
> > > Hi Jun,
> > >
> > > 30. The rate is accumulated at the partition level. Let me clarify this
> > in
> > > the KIP.
> > >
> > > Best,
> > > David
> > >
> > > On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:
> > >
> > > > Hi David,
> > > >
> > > > The KIP looks good to me. I am going to the voting thread...
> > > >
> > > > Hi Jun,
> > > >
> > > > Yes, exactly. That's a separate thing from this KIP, so working on
> the
> > > fix.
> > > >
> > > > Thanks,
> > > > Anna
> > > >
> > > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
> > > >
> > > > > Hi, Anna,
> > > > >
> > > > > Thanks for the comment. For the problem that you described, perhaps
> > we
> > > > need
> > > > > to make the quota checking and recording more atomic?
> > > > >
> > > > > Hi, David,
> > > > >
> > > > > Thanks for the updated KIP.  Looks good to me now. Just one minor
> > > comment
> > > > > below.
> > > > >
> > > > > 30. controller_mutations_rate: For topic creation and deletion, is
> > the
> > > > rate
> > > > > accumulated at the topic or partition level? It would be useful to
> > make
> > > > it
> > > > > clear in the wiki.
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot 
> > > wrote:
> > > > >
> > > > > > Hi Anna and Jun,
> > > > > >
> > > > > > You are right. We should allocate up to the quota for each old
> > > sample.
> > > > > >
> > > > > > I have revamped the Throttling Algorithm section to better
> explain
> > > our
> > > > > > thought process and the token bucket inspiration.
> > > > > >
> > > > > > I have also added a chapter with few guidelines about how to
> define
> > > > > > the quota. There is no magic formula for this but I give few
> > > insights.
> > > > > > I don't have specific numbers that can be used out of the box so
> I
> > > > > > think that it is better to not put any for the time being. We can
> > > > always
> > > > > > complement later on in the documentation.
> > > > > >
> > > > > > Please, take a look and let me know what you think.
> > > > > >
> > > > > > Cheers,
> > > > > > David
> > > > > >
> > > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner 
> > > wrote:
> > > > > >
> > > > > > > Hi David and Jun,
> > > > > > >
> > > > > > > I dug a bit deeper into the Rate implementation, and wanted to
> > > > confirm
> > > > > > that
> > > > > > > I do believe that the token bucket behavior is better for the
> > > reasons
> > > > > we
> > > > > > > already discussed but wanted to summarize. The main difference
> > > > between
> > > > > > Rate
> > > > > > > and token bucket is that the Rate implementation allows a burst
> > by
> > > > > > > borrowing from the future, whereas a token bucket allows a
> burst
> > by
> > > > > using
> > > > > > > accumulated tokens from the previous idle period. Using
> > accumulated
> > > > > > tokens
> > > > > > > smoothes out the rate measurement in general. Configuring a
> large
> > > > burst
> > > > > > > requires configuring a large quota window, which causes long
> > delays
> > > > for
> > > > > > > bursty workload, due to borrowing credits from the future.
> > Perhaps
> > > it
> > > > > is
> > > > > > > useful to add a summary in the beginning of the Throttling
> > > Algorithm
> > > > > > > section?
> > > > > > >
> > > > > > > In my previous email, I mentioned the issue we observed with
> the
> > > > > > bandwidth
> > > > > > > quota, where a low quota (1MB/s per broker) was limiting
> > bandwidth
> > > > > > visibly
> > > > > > > below the quota. I thought it was strictly the issue with the
> > Rate
> > > > > > > implementation as well, but I found a root cause to be
> different
> > > but
> > > > > > > amplified by the Rate implementation (long throttle delays of
> > > > requests
> > > > > > in a
> > > > > > > burst). I will describe it here for completeness using the
> > > following
> > > > > > > example:
> > > > > > >
> > > > > > >-
> > > > > > >
> > > > > > >Quota = 1MB/s, default window size and number of samples
> > > > > > >-
> > > > > > >
> > > > > > >Suppose there are 6 connections (maximum 6 outstanding
> > > requests),
> > > > > and
> > > > > > >each produce request is 5MB. If all requests arrive in a
> > burst,
> > > > the
> > >

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread David Jacot

Hi Colin,

Thank you for your feedback.

Jun has summarized the situation pretty well. Thanks Jun! I would like to
complement it with the following points:

1. Indeed, when the quota is exceeded, the broker will reject the topic
creations, partition creations and topics deletions that are exceeding
with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will
be populated accordingly to let the client know how long it must wait.

2. I do agree that we actually want a mechanism to apply back pressure
to the clients. The KIP basically proposes a mechanism to control and to
limit the rate of operations before entering the controller. I think that
it is
similar to your thinking but is enforced based on a defined quota instead
of relying on the number of pending items in the controller.

3. You mentioned an alternative idea in your comments that, if I understood
correctly, would bound the queue to limit the overload and reject work if
the
queue is full. I have been thinking about this as well but I don't think
that it
works well in our case.
- The first reason is the one mentioned by Jun. We actually want to be able
to limit specific clients (the misbehaving ones) in a multi-tenant
environment.
- The second reason is that, at least in our current implementation, the
length of
the queue is not really a good characteristic to estimate the load.
Coming back
to your example of the CreateTopicsRequest. They create path in ZK for each
newly created topics which trigger a ChangeTopic event in the controller.
That
single event could be for a single topic in some cases or for a thousand
topics
in others.
These two reasons aside, bounding the queue also introduces a knob which
requires some tuning and thus suffers from all the points you mentioned as
well, isn't it? The value may be true for one version but not for another.

4. Regarding the integration with KIP-500. The implementation of this KIP
will span across the ApiLayer and the AdminManager. To be honest, we
can influence the implementation to work best with what you have in mind
for the future controller. If you could shed some more light on the future
internal architecture, I can take this into account during the
implementation.

5. Regarding KIP-590. For the create topics request, create partitions
request,
and delete topics request, we are lucky enough to have directed all of them
to
the controller already so we are fine with doing the admission in the broker
which hosts the controller. If we were to throttle more operations in the
future,
I believe that we can continue to do it centrally while leveraging the
principal
that is forwarded to account for the right tenant. The reason why I would
like
to keep it central is that we are talking about sparse (or bursty)
workloads here
so distributing the quota among all the brokers makes little sense.

6. Regarding the naming of the new error code. BUSY sounds too generic to
me so I would rather prefer a specific error code. The main reason being
that
we may be able to reuse it in the future for other quotas. That being said,
we
can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about
the naming. I wonder what the others think about this.

Voilà. I hope that I have addressed all your questions/points and I am
sorry for
the lengthy email.

Regards,
David

On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe  wrote:

> On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the comment. You brought up several points.
> >
> > 1. Should we set up a per user quota? To me, it does seem we need some
> sort
> > of a quota. When the controller runs out of resources, ideally, we only
> > want to penalize the bad behaving applications, instead of every
> > application. To do that, we will need to know what each application is
> > entitled to and the per user quota is intended to capture that.
> >
> > 2. How easy is it to configure a quota? The following is how an admin
> > typically sets up a quota in our existing systems. Pick a generous
> default
> > per user quota works for most applications. For the few resource
> intensive
> > applications, customize a higher quota for them. Reserve enough resources
> > in anticipation that a single (or a few) application will exceed the
> quota
> > at a given time.
> >
>
> Hi Jun,
>
> Thanks for the response.
>
> Maybe I was too pessimistic about the ability of admins to configure a
> useful quota here.  I do agree that it would be nice to have the ability to
> set different quotas for different users, as you mentioned.
>
> >
> > 3. How should the quota be defined? In the discussion thread, we debated
> > between a usage based model vs a rate based model. Dave and Anna argued
> for
> > the rate based model mostly because it's simpler to implement.
> >
>
> I'm trying to think more about how this integrates with our plans for
> KIP-500.  When we get rid of ZK, we will have to handle this in the
> controller itself, rather than in the AdminManager.  That

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-09 Thread David Jacot

Hi Jun,

40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated.
This
is required to let the client know how long it must wait. This is explained
in the
"Handling of new/old clients".

Best,
David

On Mon, Jun 8, 2020 at 9:29 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the updated KIP. Another minor comment below.
>
> 40. For the new `QUOTA_VIOLATED` error in the response to
> CreateTopics/CreatePartitions/DeleteTopics, could you clarify
> whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED?
>
> Jun
>
> On Mon, Jun 8, 2020 at 9:32 AM David Jacot  wrote:
>
> > Hi Jun,
> >
> > 30. The rate is accumulated at the partition level. Let me clarify this
> in
> > the KIP.
> >
> > Best,
> > David
> >
> > On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:
> >
> > > Hi David,
> > >
> > > The KIP looks good to me. I am going to the voting thread...
> > >
> > > Hi Jun,
> > >
> > > Yes, exactly. That's a separate thing from this KIP, so working on the
> > fix.
> > >
> > > Thanks,
> > > Anna
> > >
> > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
> > >
> > > > Hi, Anna,
> > > >
> > > > Thanks for the comment. For the problem that you described, perhaps
> we
> > > need
> > > > to make the quota checking and recording more atomic?
> > > >
> > > > Hi, David,
> > > >
> > > > Thanks for the updated KIP.  Looks good to me now. Just one minor
> > comment
> > > > below.
> > > >
> > > > 30. controller_mutations_rate: For topic creation and deletion, is
> the
> > > rate
> > > > accumulated at the topic or partition level? It would be useful to
> make
> > > it
> > > > clear in the wiki.
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot 
> > wrote:
> > > >
> > > > > Hi Anna and Jun,
> > > > >
> > > > > You are right. We should allocate up to the quota for each old
> > sample.
> > > > >
> > > > > I have revamped the Throttling Algorithm section to better explain
> > our
> > > > > thought process and the token bucket inspiration.
> > > > >
> > > > > I have also added a chapter with few guidelines about how to define
> > > > > the quota. There is no magic formula for this but I give few
> > insights.
> > > > > I don't have specific numbers that can be used out of the box so I
> > > > > think that it is better to not put any for the time being. We can
> > > always
> > > > > complement later on in the documentation.
> > > > >
> > > > > Please, take a look and let me know what you think.
> > > > >
> > > > > Cheers,
> > > > > David
> > > > >
> > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner 
> > wrote:
> > > > >
> > > > > > Hi David and Jun,
> > > > > >
> > > > > > I dug a bit deeper into the Rate implementation, and wanted to
> > > confirm
> > > > > that
> > > > > > I do believe that the token bucket behavior is better for the
> > reasons
> > > > we
> > > > > > already discussed but wanted to summarize. The main difference
> > > between
> > > > > Rate
> > > > > > and token bucket is that the Rate implementation allows a burst
> by
> > > > > > borrowing from the future, whereas a token bucket allows a burst
> by
> > > > using
> > > > > > accumulated tokens from the previous idle period. Using
> accumulated
> > > > > tokens
> > > > > > smoothes out the rate measurement in general. Configuring a large
> > > burst
> > > > > > requires configuring a large quota window, which causes long
> delays
> > > for
> > > > > > bursty workload, due to borrowing credits from the future.
> Perhaps
> > it
> > > > is
> > > > > > useful to add a summary in the beginning of the Throttling
> > Algorithm
> > > > > > section?
> > > > > >
> > > > > > In my previous email, I mentioned the issue we observed with the
> > > > > bandwidth
> > > > > > quota, where a low quota (1MB/s per broker) was limiting
> bandwidth
> > > > > visibly
> > > > > > below the quota. I thought it was strictly the issue with the
> Rate
> > > > > > implementation as well, but I found a root cause to be different
> > but
> > > > > > amplified by the Rate implementation (long throttle delays of
> > > requests
> > > > > in a
> > > > > > burst). I will describe it here for completeness using the
> > following
> > > > > > example:
> > > > > >
> > > > > >-
> > > > > >
> > > > > >Quota = 1MB/s, default window size and number of samples
> > > > > >-
> > > > > >
> > > > > >Suppose there are 6 connections (maximum 6 outstanding
> > requests),
> > > > and
> > > > > >each produce request is 5MB. If all requests arrive in a
> burst,
> > > the
> > > > > > last 4
> > > > > >requests (20MB over 10MB allowed in a window) may get the same
> > > > > throttle
> > > > > >time if they are processed concurrently. We record the rate
> > under
> > > > the
> > > > > > lock,
> > > > > >but then calculate throttle time separately after that. So,
> for
> > > each
> > > > > >request, the observed rate could be 3MB/s, and each request
> gets
> > > > > > throttle
> > > > > >delay = 20 seconds

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Colin McCabe

On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the comment. You brought up several points.
> 
> 1. Should we set up a per user quota? To me, it does seem we need some sort
> of a quota. When the controller runs out of resources, ideally, we only
> want to penalize the bad behaving applications, instead of every
> application. To do that, we will need to know what each application is
> entitled to and the per user quota is intended to capture that.
> 
> 2. How easy is it to configure a quota? The following is how an admin
> typically sets up a quota in our existing systems. Pick a generous default
> per user quota works for most applications. For the few resource intensive
> applications, customize a higher quota for them. Reserve enough resources
> in anticipation that a single (or a few) application will exceed the quota
> at a given time.
>

Hi Jun,

Thanks for the response.

Maybe I was too pessimistic about the ability of admins to configure a useful 
quota here.  I do agree that it would be nice to have the ability to set 
different quotas for different users, as you mentioned.

> 
> 3. How should the quota be defined? In the discussion thread, we debated
> between a usage based model vs a rate based model. Dave and Anna argued for
> the rate based model mostly because it's simpler to implement.
> 

I'm trying to think more about how this integrates with our plans for KIP-500.  
When we get rid of ZK, we will have to handle this in the controller itself, 
rather than in the AdminManager.  That implies we'll have to rewrite the code.  
Maybe this is worth it if we want this feature now, though.

Another wrinkle here is that as we discussed in KIP-590, controller operations 
will land on a random broker first, and only then be forwarded to the active 
controller.  This implies that either admissions control should happen on all 
brokers (needing some kind of distributed quota scheme), or be done on the 
controller after we've already done the work of forwarding the message.  The 
second approach might not be that bad, but it would be nice to figure this out.

>
> 4. If a quota is exceeded, how is that enforced? My understanding of the
> KIP is that, if a quota is exceeded, the broker immediately sends back
> a QUOTA_VIOLATED error and a throttle time back to the client, and the
> client will wait for the throttle time before issuing the next request.
> This seems to be the same as the BUSY error code you mentioned.
>

Yes, I agree, it sounds like we're thinking along the same lines.  However, 
rather than QUOTA_VIOLATED, how about naming the error code BUSY?  Then the 
error text could indicate the quota that we violated.  This would be more 
generally useful as an error code and also avoid being confusingly similar to 
POLICY_VIOLATION.

best,
Colin

> 
> I will let David chime in more on that.
> 
> Thanks,
> 
> Jun
> 
> 
> 
> On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe  wrote:
> 
> > Hi David,
> >
> > Thanks for the KIP.
> >
> > I thought about this for a while and I actually think this approach is not
> > quite right.  The problem that I see here is that using an explicitly set
> > quota here requires careful tuning by the cluster operator.  Even worse,
> > this tuning might be invalidated by changes in overall conditions or even
> > more efficient controller software.
> >
> > For example, if we empirically find that the controller can do 1000 topics
> > in a minute (or whatever), this tuning might actually be wrong if the next
> > version of the software can do 2000 topics in a minute because of
> > efficiency upgrades.  Or, the broker that the controller is located on
> > might be experiencing heavy load from its non-controller operations, and so
> > it can only do 500 topics in a minute during this period.
> >
> > So the system administrator gets a very obscure tunable (it's not clear to
> > a non-Kafka-developer what "controller mutations" are or why they should
> > care).  And even worse, they will have to significantly "sandbag" the value
> > that they set it to, so that even under the heaviest load and oldest
> > deployed version of the software, the controller can still function.  Even
> > worse, this new quota adds a lot of complexity to the controller.
> >
> > What we really want is backpressure when the controller is overloaded.  I
> > believe this is the alternative you discuss in "Rejected Alternatives"
> > under "Throttle the Execution instead of the Admission"  Your reason for
> > rejecting it is that the client error handling does not work well in this
> > case.  But actually, this is an artifact of our current implementation,
> > rather than a fundamental issue with backpressure.
> >
> > Consider the example of a CreateTopicsRequest.  The controller could
> > return a special error code if the load was too high, and take the create
> > topics event off the controller queue.  Let's call that error code BUSY.
> >  Additionally, the controller could

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Jun Rao

Hi, Colin,

Thanks for the comment. You brought up several points.

1. Should we set up a per user quota? To me, it does seem we need some sort
of a quota. When the controller runs out of resources, ideally, we only
want to penalize the bad behaving applications, instead of every
application. To do that, we will need to know what each application is
entitled to and the per user quota is intended to capture that.

2. How easy is it to configure a quota? The following is how an admin
typically sets up a quota in our existing systems. Pick a generous default
per user quota works for most applications. For the few resource intensive
applications, customize a higher quota for them. Reserve enough resources
in anticipation that a single (or a few) application will exceed the quota
at a given time.

3. How should the quota be defined? In the discussion thread, we debated
between a usage based model vs a rate based model. Dave and Anna argued for
the rate based model mostly because it's simpler to implement.

4. If a quota is exceeded, how is that enforced? My understanding of the
KIP is that, if a quota is exceeded, the broker immediately sends back
a QUOTA_VIOLATED error and a throttle time back to the client, and the
client will wait for the throttle time before issuing the next request.
This seems to be the same as the BUSY error code you mentioned.

I will let David chime in more on that.

Thanks,

Jun



On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe  wrote:

> Hi David,
>
> Thanks for the KIP.
>
> I thought about this for a while and I actually think this approach is not
> quite right.  The problem that I see here is that using an explicitly set
> quota here requires careful tuning by the cluster operator.  Even worse,
> this tuning might be invalidated by changes in overall conditions or even
> more efficient controller software.
>
> For example, if we empirically find that the controller can do 1000 topics
> in a minute (or whatever), this tuning might actually be wrong if the next
> version of the software can do 2000 topics in a minute because of
> efficiency upgrades.  Or, the broker that the controller is located on
> might be experiencing heavy load from its non-controller operations, and so
> it can only do 500 topics in a minute during this period.
>
> So the system administrator gets a very obscure tunable (it's not clear to
> a non-Kafka-developer what "controller mutations" are or why they should
> care).  And even worse, they will have to significantly "sandbag" the value
> that they set it to, so that even under the heaviest load and oldest
> deployed version of the software, the controller can still function.  Even
> worse, this new quota adds a lot of complexity to the controller.
>
> What we really want is backpressure when the controller is overloaded.  I
> believe this is the alternative you discuss in "Rejected Alternatives"
> under "Throttle the Execution instead of the Admission"  Your reason for
> rejecting it is that the client error handling does not work well in this
> case.  But actually, this is an artifact of our current implementation,
> rather than a fundamental issue with backpressure.
>
> Consider the example of a CreateTopicsRequest.  The controller could
> return a special error code if the load was too high, and take the create
> topics event off the controller queue.  Let's call that error code BUSY.
>  Additionally, the controller could immediately refuse new events if the
> queue had reached its maximum length, and simply return BUSY for that case
> as well.
>
> Basically, the way we handle RPC timeouts in the controller right now is
> not very good.  As you know, we time out the RPC, so the client gets
> TimeoutException, but then keep the event on the queue, so that it
> eventually gets executed!  There's no reason why we have to do that.  We
> could take the event off the queue if there is a timeout.  This would
> reduce load and mostly avoid the paradoxical situations you describe
> (getting TopicExistsException for a CreateTopicsRequest retry, etc.)
>
> I say "mostly" because there are still cases where retries could go astray
> (for example if we execute the topic creation but a network problem
> prevents the response from being sent to the client).  But this would still
> be a very big improvement over what we have now.
>
> Sorry for commenting so late on this but I got distracted by some other
> things.  I hope we can figure this one out-- I feel like there is a chance
> to significantly simplify this.
>
> best,
> Colin
>
>
> On Fri, May 29, 2020, at 07:57, David Jacot wrote:
> > Hi folks,
> >
> > I'd like to start the vote for KIP-599 which proposes a new quota to
> > throttle create topic, create partition, and delete topics operations to
> > protect the Kafka controller:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> >
> > Please, let me know what you think.
> >
> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Jun Rao

Hi, David,

Thanks for the updated KIP. Another minor comment below.

40. For the new `QUOTA_VIOLATED` error in the response to
CreateTopics/CreatePartitions/DeleteTopics, could you clarify
whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED?

Jun

On Mon, Jun 8, 2020 at 9:32 AM David Jacot  wrote:

> Hi Jun,
>
> 30. The rate is accumulated at the partition level. Let me clarify this in
> the KIP.
>
> Best,
> David
>
> On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:
>
> > Hi David,
> >
> > The KIP looks good to me. I am going to the voting thread...
> >
> > Hi Jun,
> >
> > Yes, exactly. That's a separate thing from this KIP, so working on the
> fix.
> >
> > Thanks,
> > Anna
> >
> > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
> >
> > > Hi, Anna,
> > >
> > > Thanks for the comment. For the problem that you described, perhaps we
> > need
> > > to make the quota checking and recording more atomic?
> > >
> > > Hi, David,
> > >
> > > Thanks for the updated KIP.  Looks good to me now. Just one minor
> comment
> > > below.
> > >
> > > 30. controller_mutations_rate: For topic creation and deletion, is the
> > rate
> > > accumulated at the topic or partition level? It would be useful to make
> > it
> > > clear in the wiki.
> > >
> > > Jun
> > >
> > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot 
> wrote:
> > >
> > > > Hi Anna and Jun,
> > > >
> > > > You are right. We should allocate up to the quota for each old
> sample.
> > > >
> > > > I have revamped the Throttling Algorithm section to better explain
> our
> > > > thought process and the token bucket inspiration.
> > > >
> > > > I have also added a chapter with few guidelines about how to define
> > > > the quota. There is no magic formula for this but I give few
> insights.
> > > > I don't have specific numbers that can be used out of the box so I
> > > > think that it is better to not put any for the time being. We can
> > always
> > > > complement later on in the documentation.
> > > >
> > > > Please, take a look and let me know what you think.
> > > >
> > > > Cheers,
> > > > David
> > > >
> > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner 
> wrote:
> > > >
> > > > > Hi David and Jun,
> > > > >
> > > > > I dug a bit deeper into the Rate implementation, and wanted to
> > confirm
> > > > that
> > > > > I do believe that the token bucket behavior is better for the
> reasons
> > > we
> > > > > already discussed but wanted to summarize. The main difference
> > between
> > > > Rate
> > > > > and token bucket is that the Rate implementation allows a burst by
> > > > > borrowing from the future, whereas a token bucket allows a burst by
> > > using
> > > > > accumulated tokens from the previous idle period. Using accumulated
> > > > tokens
> > > > > smoothes out the rate measurement in general. Configuring a large
> > burst
> > > > > requires configuring a large quota window, which causes long delays
> > for
> > > > > bursty workload, due to borrowing credits from the future. Perhaps
> it
> > > is
> > > > > useful to add a summary in the beginning of the Throttling
> Algorithm
> > > > > section?
> > > > >
> > > > > In my previous email, I mentioned the issue we observed with the
> > > > bandwidth
> > > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> > > > visibly
> > > > > below the quota. I thought it was strictly the issue with the Rate
> > > > > implementation as well, but I found a root cause to be different
> but
> > > > > amplified by the Rate implementation (long throttle delays of
> > requests
> > > > in a
> > > > > burst). I will describe it here for completeness using the
> following
> > > > > example:
> > > > >
> > > > >-
> > > > >
> > > > >Quota = 1MB/s, default window size and number of samples
> > > > >-
> > > > >
> > > > >Suppose there are 6 connections (maximum 6 outstanding
> requests),
> > > and
> > > > >each produce request is 5MB. If all requests arrive in a burst,
> > the
> > > > > last 4
> > > > >requests (20MB over 10MB allowed in a window) may get the same
> > > > throttle
> > > > >time if they are processed concurrently. We record the rate
> under
> > > the
> > > > > lock,
> > > > >but then calculate throttle time separately after that. So, for
> > each
> > > > >request, the observed rate could be 3MB/s, and each request gets
> > > > > throttle
> > > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The
> > > delay
> > > > is
> > > > >longer than the total rate window, which results in lower
> > bandwidth
> > > > than
> > > > >the quota. Since all requests got the same delay, they will also
> > > > arrive
> > > > > in
> > > > >a burst, which may also result in longer delay than necessary.
> It
> > > > looks
> > > > >pretty easy to fix, so I will open a separate JIRA for it. This
> > can
> > > be
> > > > >additionally mitigated by token bucket behavior.
> > > > >
> > > > >
> > > > > For the algorithm "So instead of

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread David Jacot

Hi Jun,

30. The rate is accumulated at the partition level. Let me clarify this in
the KIP.

Best,
David

On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:

> Hi David,
>
> The KIP looks good to me. I am going to the voting thread...
>
> Hi Jun,
>
> Yes, exactly. That's a separate thing from this KIP, so working on the fix.
>
> Thanks,
> Anna
>
> On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
>
> > Hi, Anna,
> >
> > Thanks for the comment. For the problem that you described, perhaps we
> need
> > to make the quota checking and recording more atomic?
> >
> > Hi, David,
> >
> > Thanks for the updated KIP.  Looks good to me now. Just one minor comment
> > below.
> >
> > 30. controller_mutations_rate: For topic creation and deletion, is the
> rate
> > accumulated at the topic or partition level? It would be useful to make
> it
> > clear in the wiki.
> >
> > Jun
> >
> > On Fri, Jun 5, 2020 at 7:23 AM David Jacot  wrote:
> >
> > > Hi Anna and Jun,
> > >
> > > You are right. We should allocate up to the quota for each old sample.
> > >
> > > I have revamped the Throttling Algorithm section to better explain our
> > > thought process and the token bucket inspiration.
> > >
> > > I have also added a chapter with few guidelines about how to define
> > > the quota. There is no magic formula for this but I give few insights.
> > > I don't have specific numbers that can be used out of the box so I
> > > think that it is better to not put any for the time being. We can
> always
> > > complement later on in the documentation.
> > >
> > > Please, take a look and let me know what you think.
> > >
> > > Cheers,
> > > David
> > >
> > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner  wrote:
> > >
> > > > Hi David and Jun,
> > > >
> > > > I dug a bit deeper into the Rate implementation, and wanted to
> confirm
> > > that
> > > > I do believe that the token bucket behavior is better for the reasons
> > we
> > > > already discussed but wanted to summarize. The main difference
> between
> > > Rate
> > > > and token bucket is that the Rate implementation allows a burst by
> > > > borrowing from the future, whereas a token bucket allows a burst by
> > using
> > > > accumulated tokens from the previous idle period. Using accumulated
> > > tokens
> > > > smoothes out the rate measurement in general. Configuring a large
> burst
> > > > requires configuring a large quota window, which causes long delays
> for
> > > > bursty workload, due to borrowing credits from the future. Perhaps it
> > is
> > > > useful to add a summary in the beginning of the Throttling Algorithm
> > > > section?
> > > >
> > > > In my previous email, I mentioned the issue we observed with the
> > > bandwidth
> > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> > > visibly
> > > > below the quota. I thought it was strictly the issue with the Rate
> > > > implementation as well, but I found a root cause to be different but
> > > > amplified by the Rate implementation (long throttle delays of
> requests
> > > in a
> > > > burst). I will describe it here for completeness using the following
> > > > example:
> > > >
> > > >-
> > > >
> > > >Quota = 1MB/s, default window size and number of samples
> > > >-
> > > >
> > > >Suppose there are 6 connections (maximum 6 outstanding requests),
> > and
> > > >each produce request is 5MB. If all requests arrive in a burst,
> the
> > > > last 4
> > > >requests (20MB over 10MB allowed in a window) may get the same
> > > throttle
> > > >time if they are processed concurrently. We record the rate under
> > the
> > > > lock,
> > > >but then calculate throttle time separately after that. So, for
> each
> > > >request, the observed rate could be 3MB/s, and each request gets
> > > > throttle
> > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The
> > delay
> > > is
> > > >longer than the total rate window, which results in lower
> bandwidth
> > > than
> > > >the quota. Since all requests got the same delay, they will also
> > > arrive
> > > > in
> > > >a burst, which may also result in longer delay than necessary. It
> > > looks
> > > >pretty easy to fix, so I will open a separate JIRA for it. This
> can
> > be
> > > >additionally mitigated by token bucket behavior.
> > > >
> > > >
> > > > For the algorithm "So instead of having one sample equal to 560 in
> the
> > > last
> > > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I
> > > would
> > > > allocate 5 per each old sample that is still in the overall window.
> It
> > > > would be a bit larger granularity than the pure token bucket (we
> lose 5
> > > > units / mutation once we move past the sample window), but it is
> better
> > > > than the long delay.
> > > >
> > > > Thanks,
> > > >
> > > > Anna
> > > >
> > > >
> > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:
> > > >
> > > > > Hi, David, Anna,
> > > > >
> > > > > Thanks for the discussion and the updated

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-07 Thread Colin McCabe

Hi David,

Thanks for the KIP.

I thought about this for a while and I actually think this approach is not 
quite right.  The problem that I see here is that using an explicitly set quota 
here requires careful tuning by the cluster operator.  Even worse, this tuning 
might be invalidated by changes in overall conditions or even more efficient 
controller software.

For example, if we empirically find that the controller can do 1000 topics in a 
minute (or whatever), this tuning might actually be wrong if the next version 
of the software can do 2000 topics in a minute because of efficiency upgrades.  
Or, the broker that the controller is located on might be experiencing heavy 
load from its non-controller operations, and so it can only do 500 topics in a 
minute during this period.

So the system administrator gets a very obscure tunable (it's not clear to a 
non-Kafka-developer what "controller mutations" are or why they should care).  
And even worse, they will have to significantly "sandbag" the value that they 
set it to, so that even under the heaviest load and oldest deployed version of 
the software, the controller can still function.  Even worse, this new quota 
adds a lot of complexity to the controller.

What we really want is backpressure when the controller is overloaded.  I 
believe this is the alternative you discuss in "Rejected Alternatives" under 
"Throttle the Execution instead of the Admission"  Your reason for rejecting it 
is that the client error handling does not work well in this case.  But 
actually, this is an artifact of our current implementation, rather than a 
fundamental issue with backpressure.

Consider the example of a CreateTopicsRequest.  The controller could return a 
special error code if the load was too high, and take the create topics event 
off the controller queue.  Let's call that error code BUSY. 
 Additionally, the controller could immediately refuse new events if the queue 
had reached its maximum length, and simply return BUSY for that case as well.

Basically, the way we handle RPC timeouts in the controller right now is not 
very good.  As you know, we time out the RPC, so the client gets 
TimeoutException, but then keep the event on the queue, so that it eventually 
gets executed!  There's no reason why we have to do that.  We could take the 
event off the queue if there is a timeout.  This would reduce load and mostly 
avoid the paradoxical situations you describe (getting TopicExistsException for 
a CreateTopicsRequest retry, etc.)

I say "mostly" because there are still cases where retries could go astray (for 
example if we execute the topic creation but a network problem prevents the 
response from being sent to the client).  But this would still be a very big 
improvement over what we have now.

Sorry for commenting so late on this but I got distracted by some other things. 
 I hope we can figure this one out-- I feel like there is a chance to 
significantly simplify this.

best,
Colin

On Fri, May 29, 2020, at 07:57, David Jacot wrote:
> Hi folks,
> 
> I'd like to start the vote for KIP-599 which proposes a new quota to
> throttle create topic, create partition, and delete topics operations to
> protect the Kafka controller:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> 
> Please, let me know what you think.
> 
> Cheers,
> David
>

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-05 Thread Anna Povzner

+1 (not binding)

Thanks for the KIP!

-Anna

On Thu, Jun 4, 2020 at 8:26 AM Mickael Maison 
wrote:

> +1 (binding)
> Thanks David for looking into this important issue
>
> On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley  wrote:
> >
> > +1 (non binding).
> >
> > Thanks!
> >
> > On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for the KIP, David!
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Sun, May 31, 2020 at 3:29 AM Gwen Shapira 
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Looks great. Thank you for the in-depth design and discussion.
> > > >
> > > > On Fri, May 29, 2020 at 7:58 AM David Jacot 
> wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I'd like to start the vote for KIP-599 which proposes a new quota
> to
> > > > > throttle create topic, create partition, and delete topics
> operations
> > > to
> > > > > protect the Kafka controller:
> > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> > > > >
> > > > > Please, let me know what you think.
> > > > >
> > > > > Cheers,
> > > > > David
> > > > >
> > > >
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Engineering Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-05 Thread Anna Povzner

Hi David,

The KIP looks good to me. I am going to the voting thread...

Hi Jun,

Yes, exactly. That's a separate thing from this KIP, so working on the fix.

Thanks,
Anna

On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:

> Hi, Anna,
>
> Thanks for the comment. For the problem that you described, perhaps we need
> to make the quota checking and recording more atomic?
>
> Hi, David,
>
> Thanks for the updated KIP.  Looks good to me now. Just one minor comment
> below.
>
> 30. controller_mutations_rate: For topic creation and deletion, is the rate
> accumulated at the topic or partition level? It would be useful to make it
> clear in the wiki.
>
> Jun
>
> On Fri, Jun 5, 2020 at 7:23 AM David Jacot  wrote:
>
> > Hi Anna and Jun,
> >
> > You are right. We should allocate up to the quota for each old sample.
> >
> > I have revamped the Throttling Algorithm section to better explain our
> > thought process and the token bucket inspiration.
> >
> > I have also added a chapter with few guidelines about how to define
> > the quota. There is no magic formula for this but I give few insights.
> > I don't have specific numbers that can be used out of the box so I
> > think that it is better to not put any for the time being. We can always
> > complement later on in the documentation.
> >
> > Please, take a look and let me know what you think.
> >
> > Cheers,
> > David
> >
> > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner  wrote:
> >
> > > Hi David and Jun,
> > >
> > > I dug a bit deeper into the Rate implementation, and wanted to confirm
> > that
> > > I do believe that the token bucket behavior is better for the reasons
> we
> > > already discussed but wanted to summarize. The main difference between
> > Rate
> > > and token bucket is that the Rate implementation allows a burst by
> > > borrowing from the future, whereas a token bucket allows a burst by
> using
> > > accumulated tokens from the previous idle period. Using accumulated
> > tokens
> > > smoothes out the rate measurement in general. Configuring a large burst
> > > requires configuring a large quota window, which causes long delays for
> > > bursty workload, due to borrowing credits from the future. Perhaps it
> is
> > > useful to add a summary in the beginning of the Throttling Algorithm
> > > section?
> > >
> > > In my previous email, I mentioned the issue we observed with the
> > bandwidth
> > > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> > visibly
> > > below the quota. I thought it was strictly the issue with the Rate
> > > implementation as well, but I found a root cause to be different but
> > > amplified by the Rate implementation (long throttle delays of requests
> > in a
> > > burst). I will describe it here for completeness using the following
> > > example:
> > >
> > >-
> > >
> > >Quota = 1MB/s, default window size and number of samples
> > >-
> > >
> > >Suppose there are 6 connections (maximum 6 outstanding requests),
> and
> > >each produce request is 5MB. If all requests arrive in a burst, the
> > > last 4
> > >requests (20MB over 10MB allowed in a window) may get the same
> > throttle
> > >time if they are processed concurrently. We record the rate under
> the
> > > lock,
> > >but then calculate throttle time separately after that. So, for each
> > >request, the observed rate could be 3MB/s, and each request gets
> > > throttle
> > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The
> delay
> > is
> > >longer than the total rate window, which results in lower bandwidth
> > than
> > >the quota. Since all requests got the same delay, they will also
> > arrive
> > > in
> > >a burst, which may also result in longer delay than necessary. It
> > looks
> > >pretty easy to fix, so I will open a separate JIRA for it. This can
> be
> > >additionally mitigated by token bucket behavior.
> > >
> > >
> > > For the algorithm "So instead of having one sample equal to 560 in the
> > last
> > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I
> > would
> > > allocate 5 per each old sample that is still in the overall window. It
> > > would be a bit larger granularity than the pure token bucket (we lose 5
> > > units / mutation once we move past the sample window), but it is better
> > > than the long delay.
> > >
> > > Thanks,
> > >
> > > Anna
> > >
> > >
> > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:
> > >
> > > > Hi, David, Anna,
> > > >
> > > > Thanks for the discussion and the updated wiki.
> > > >
> > > > 11. If we believe the token bucket behavior is better in terms of
> > > handling
> > > > the burst behavior, we probably don't need a separate KIP since it's
> > just
> > > > an implementation detail.
> > > >
> > > > Regarding "So instead of having one sample equal to 560 in the last
> > > window,
> > > > we will have 100 samples equal to 5.6.", I was thinking that we will
> > > > allocate 5 to each of the first 99 samples and 65

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-05 Thread Jun Rao

Hi, Anna,

Thanks for the comment. For the problem that you described, perhaps we need
to make the quota checking and recording more atomic?

Hi, David,

Thanks for the updated KIP.  Looks good to me now. Just one minor comment
below.

30. controller_mutations_rate: For topic creation and deletion, is the rate
accumulated at the topic or partition level? It would be useful to make it
clear in the wiki.

Jun

On Fri, Jun 5, 2020 at 7:23 AM David Jacot  wrote:

> Hi Anna and Jun,
>
> You are right. We should allocate up to the quota for each old sample.
>
> I have revamped the Throttling Algorithm section to better explain our
> thought process and the token bucket inspiration.
>
> I have also added a chapter with few guidelines about how to define
> the quota. There is no magic formula for this but I give few insights.
> I don't have specific numbers that can be used out of the box so I
> think that it is better to not put any for the time being. We can always
> complement later on in the documentation.
>
> Please, take a look and let me know what you think.
>
> Cheers,
> David
>
> On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner  wrote:
>
> > Hi David and Jun,
> >
> > I dug a bit deeper into the Rate implementation, and wanted to confirm
> that
> > I do believe that the token bucket behavior is better for the reasons we
> > already discussed but wanted to summarize. The main difference between
> Rate
> > and token bucket is that the Rate implementation allows a burst by
> > borrowing from the future, whereas a token bucket allows a burst by using
> > accumulated tokens from the previous idle period. Using accumulated
> tokens
> > smoothes out the rate measurement in general. Configuring a large burst
> > requires configuring a large quota window, which causes long delays for
> > bursty workload, due to borrowing credits from the future. Perhaps it is
> > useful to add a summary in the beginning of the Throttling Algorithm
> > section?
> >
> > In my previous email, I mentioned the issue we observed with the
> bandwidth
> > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> visibly
> > below the quota. I thought it was strictly the issue with the Rate
> > implementation as well, but I found a root cause to be different but
> > amplified by the Rate implementation (long throttle delays of requests
> in a
> > burst). I will describe it here for completeness using the following
> > example:
> >
> >-
> >
> >Quota = 1MB/s, default window size and number of samples
> >-
> >
> >Suppose there are 6 connections (maximum 6 outstanding requests), and
> >each produce request is 5MB. If all requests arrive in a burst, the
> > last 4
> >requests (20MB over 10MB allowed in a window) may get the same
> throttle
> >time if they are processed concurrently. We record the rate under the
> > lock,
> >but then calculate throttle time separately after that. So, for each
> >request, the observed rate could be 3MB/s, and each request gets
> > throttle
> >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay
> is
> >longer than the total rate window, which results in lower bandwidth
> than
> >the quota. Since all requests got the same delay, they will also
> arrive
> > in
> >a burst, which may also result in longer delay than necessary. It
> looks
> >pretty easy to fix, so I will open a separate JIRA for it. This can be
> >additionally mitigated by token bucket behavior.
> >
> >
> > For the algorithm "So instead of having one sample equal to 560 in the
> last
> > window, we will have 100 samples equal to 5.6.", I agree with Jun. I
> would
> > allocate 5 per each old sample that is still in the overall window. It
> > would be a bit larger granularity than the pure token bucket (we lose 5
> > units / mutation once we move past the sample window), but it is better
> > than the long delay.
> >
> > Thanks,
> >
> > Anna
> >
> >
> > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:
> >
> > > Hi, David, Anna,
> > >
> > > Thanks for the discussion and the updated wiki.
> > >
> > > 11. If we believe the token bucket behavior is better in terms of
> > handling
> > > the burst behavior, we probably don't need a separate KIP since it's
> just
> > > an implementation detail.
> > >
> > > Regarding "So instead of having one sample equal to 560 in the last
> > window,
> > > we will have 100 samples equal to 5.6.", I was thinking that we will
> > > allocate 5 to each of the first 99 samples and 65 to the last sample.
> > Then,
> > > 6 new samples have to come before the balance becomes 0 again.
> > Intuitively,
> > > we are accumulating credits in each sample. If a usage comes in, we
> first
> > > use all existing credits to offset that. If we can't, the remaining
> usage
> > > will be recorded in the last sample, which will be offset by future
> > > credits. That seems to match the token bucket behavior the closest.
> > >
> > > 20. Could you provide some guidelines

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-05 Thread David Jacot

Hi Anna and Jun,

You are right. We should allocate up to the quota for each old sample.

I have revamped the Throttling Algorithm section to better explain our
thought process and the token bucket inspiration.

I have also added a chapter with few guidelines about how to define
the quota. There is no magic formula for this but I give few insights.
I don't have specific numbers that can be used out of the box so I
think that it is better to not put any for the time being. We can always
complement later on in the documentation.

Please, take a look and let me know what you think.

Cheers,
David

On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner  wrote:

> Hi David and Jun,
>
> I dug a bit deeper into the Rate implementation, and wanted to confirm that
> I do believe that the token bucket behavior is better for the reasons we
> already discussed but wanted to summarize. The main difference between Rate
> and token bucket is that the Rate implementation allows a burst by
> borrowing from the future, whereas a token bucket allows a burst by using
> accumulated tokens from the previous idle period. Using accumulated tokens
> smoothes out the rate measurement in general. Configuring a large burst
> requires configuring a large quota window, which causes long delays for
> bursty workload, due to borrowing credits from the future. Perhaps it is
> useful to add a summary in the beginning of the Throttling Algorithm
> section?
>
> In my previous email, I mentioned the issue we observed with the bandwidth
> quota, where a low quota (1MB/s per broker) was limiting bandwidth visibly
> below the quota. I thought it was strictly the issue with the Rate
> implementation as well, but I found a root cause to be different but
> amplified by the Rate implementation (long throttle delays of requests in a
> burst). I will describe it here for completeness using the following
> example:
>
>-
>
>Quota = 1MB/s, default window size and number of samples
>-
>
>Suppose there are 6 connections (maximum 6 outstanding requests), and
>each produce request is 5MB. If all requests arrive in a burst, the
> last 4
>requests (20MB over 10MB allowed in a window) may get the same throttle
>time if they are processed concurrently. We record the rate under the
> lock,
>but then calculate throttle time separately after that. So, for each
>request, the observed rate could be 3MB/s, and each request gets
> throttle
>delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay is
>longer than the total rate window, which results in lower bandwidth than
>the quota. Since all requests got the same delay, they will also arrive
> in
>a burst, which may also result in longer delay than necessary. It looks
>pretty easy to fix, so I will open a separate JIRA for it. This can be
>additionally mitigated by token bucket behavior.
>
>
> For the algorithm "So instead of having one sample equal to 560 in the last
> window, we will have 100 samples equal to 5.6.", I agree with Jun. I would
> allocate 5 per each old sample that is still in the overall window. It
> would be a bit larger granularity than the pure token bucket (we lose 5
> units / mutation once we move past the sample window), but it is better
> than the long delay.
>
> Thanks,
>
> Anna
>
>
> On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:
>
> > Hi, David, Anna,
> >
> > Thanks for the discussion and the updated wiki.
> >
> > 11. If we believe the token bucket behavior is better in terms of
> handling
> > the burst behavior, we probably don't need a separate KIP since it's just
> > an implementation detail.
> >
> > Regarding "So instead of having one sample equal to 560 in the last
> window,
> > we will have 100 samples equal to 5.6.", I was thinking that we will
> > allocate 5 to each of the first 99 samples and 65 to the last sample.
> Then,
> > 6 new samples have to come before the balance becomes 0 again.
> Intuitively,
> > we are accumulating credits in each sample. If a usage comes in, we first
> > use all existing credits to offset that. If we can't, the remaining usage
> > will be recorded in the last sample, which will be offset by future
> > credits. That seems to match the token bucket behavior the closest.
> >
> > 20. Could you provide some guidelines on the typical rate that an admin
> > should set?
> >
> > Jun
> >
> > On Thu, Jun 4, 2020 at 8:22 AM David Jacot  wrote:
> >
> > > Hi all,
> > >
> > > I just published an updated version of the KIP which includes:
> > > * Using a slightly modified version of our Rate. I have tried to
> > formalize
> > > it based on our discussion. As Anna suggested, we may find a better way
> > to
> > > implement it.
> > > * Handling of ValidateOnly as pointed out by Tom.
> > >
> > > Please, check it out and let me know what you think.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley 
> wrote:
> > >
> > > > Hi David,
> > > >
> > > > As a user I might expect the

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-05 Thread Anna Povzner

Hi David and Jun,

I dug a bit deeper into the Rate implementation, and wanted to confirm that
I do believe that the token bucket behavior is better for the reasons we
already discussed but wanted to summarize. The main difference between Rate
and token bucket is that the Rate implementation allows a burst by
borrowing from the future, whereas a token bucket allows a burst by using
accumulated tokens from the previous idle period. Using accumulated tokens
smoothes out the rate measurement in general. Configuring a large burst
requires configuring a large quota window, which causes long delays for
bursty workload, due to borrowing credits from the future. Perhaps it is
useful to add a summary in the beginning of the Throttling Algorithm
section?

In my previous email, I mentioned the issue we observed with the bandwidth
quota, where a low quota (1MB/s per broker) was limiting bandwidth visibly
below the quota. I thought it was strictly the issue with the Rate
implementation as well, but I found a root cause to be different but
amplified by the Rate implementation (long throttle delays of requests in a
burst). I will describe it here for completeness using the following
example:

   -

   Quota = 1MB/s, default window size and number of samples
   -

   Suppose there are 6 connections (maximum 6 outstanding requests), and
   each produce request is 5MB. If all requests arrive in a burst, the last 4
   requests (20MB over 10MB allowed in a window) may get the same throttle
   time if they are processed concurrently. We record the rate under the lock,
   but then calculate throttle time separately after that. So, for each
   request, the observed rate could be 3MB/s, and each request gets throttle
   delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay is
   longer than the total rate window, which results in lower bandwidth than
   the quota. Since all requests got the same delay, they will also arrive in
   a burst, which may also result in longer delay than necessary. It looks
   pretty easy to fix, so I will open a separate JIRA for it. This can be
   additionally mitigated by token bucket behavior.

For the algorithm "So instead of having one sample equal to 560 in the last
window, we will have 100 samples equal to 5.6.", I agree with Jun. I would
allocate 5 per each old sample that is still in the overall window. It
would be a bit larger granularity than the pure token bucket (we lose 5
units / mutation once we move past the sample window), but it is better
than the long delay.

Thanks,

Anna

On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:

> Hi, David, Anna,
>
> Thanks for the discussion and the updated wiki.
>
> 11. If we believe the token bucket behavior is better in terms of handling
> the burst behavior, we probably don't need a separate KIP since it's just
> an implementation detail.
>
> Regarding "So instead of having one sample equal to 560 in the last window,
> we will have 100 samples equal to 5.6.", I was thinking that we will
> allocate 5 to each of the first 99 samples and 65 to the last sample. Then,
> 6 new samples have to come before the balance becomes 0 again. Intuitively,
> we are accumulating credits in each sample. If a usage comes in, we first
> use all existing credits to offset that. If we can't, the remaining usage
> will be recorded in the last sample, which will be offset by future
> credits. That seems to match the token bucket behavior the closest.
>
> 20. Could you provide some guidelines on the typical rate that an admin
> should set?
>
> Jun
>
> On Thu, Jun 4, 2020 at 8:22 AM David Jacot  wrote:
>
> > Hi all,
> >
> > I just published an updated version of the KIP which includes:
> > * Using a slightly modified version of our Rate. I have tried to
> formalize
> > it based on our discussion. As Anna suggested, we may find a better way
> to
> > implement it.
> > * Handling of ValidateOnly as pointed out by Tom.
> >
> > Please, check it out and let me know what you think.
> >
> > Best,
> > David
> >
> > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley  wrote:
> >
> > > Hi David,
> > >
> > > As a user I might expect the validateOnly option to do everything
> except
> > > actually make the changes. That interpretation would imply the quota
> > should
> > > be checked, but the check should obviously be side-effect free. I think
> > > this interpretation could be useful because it gives the caller either
> > some
> > > confidence that they're not going to hit the quota, or tell them, via
> the
> > > exception, when they can expect the call to work. But for this to be
> > useful
> > > it would require the retry logic to not retry the request when
> > validateOnly
> > > was set.
> > >
> > > On the other hand, if validateOnly is really about validating only some
> > > aspects of the request (which maybe is what the name implies), then we
> > > should clarify in the Javadoc that the quota is not included in the
> > > validation.
> > >
> > > On balance, I agree with what

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread Jun Rao

Hi, David, Anna,

Thanks for the discussion and the updated wiki.

11. If we believe the token bucket behavior is better in terms of handling
the burst behavior, we probably don't need a separate KIP since it's just
an implementation detail.

Regarding "So instead of having one sample equal to 560 in the last window,
we will have 100 samples equal to 5.6.", I was thinking that we will
allocate 5 to each of the first 99 samples and 65 to the last sample. Then,
6 new samples have to come before the balance becomes 0 again. Intuitively,
we are accumulating credits in each sample. If a usage comes in, we first
use all existing credits to offset that. If we can't, the remaining usage
will be recorded in the last sample, which will be offset by future
credits. That seems to match the token bucket behavior the closest.

20. Could you provide some guidelines on the typical rate that an admin
should set?

Jun

On Thu, Jun 4, 2020 at 8:22 AM David Jacot  wrote:

> Hi all,
>
> I just published an updated version of the KIP which includes:
> * Using a slightly modified version of our Rate. I have tried to formalize
> it based on our discussion. As Anna suggested, we may find a better way to
> implement it.
> * Handling of ValidateOnly as pointed out by Tom.
>
> Please, check it out and let me know what you think.
>
> Best,
> David
>
> On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley  wrote:
>
> > Hi David,
> >
> > As a user I might expect the validateOnly option to do everything except
> > actually make the changes. That interpretation would imply the quota
> should
> > be checked, but the check should obviously be side-effect free. I think
> > this interpretation could be useful because it gives the caller either
> some
> > confidence that they're not going to hit the quota, or tell them, via the
> > exception, when they can expect the call to work. But for this to be
> useful
> > it would require the retry logic to not retry the request when
> validateOnly
> > was set.
> >
> > On the other hand, if validateOnly is really about validating only some
> > aspects of the request (which maybe is what the name implies), then we
> > should clarify in the Javadoc that the quota is not included in the
> > validation.
> >
> > On balance, I agree with what you're proposing, since the extra utility
> of
> > including the quota in the validation seems to be not worth the extra
> > complication for the retry.
> >
> > Thanks,
> >
> > Tom
> >
> >
> >
> > On Thu, Jun 4, 2020 at 3:32 PM David Jacot  wrote:
> >
> > > Hi Tom,
> > >
> > > That's a good question. As the validation does not create any load on
> the
> > > controller, I was planning to do it without checking the quota at all.
> > Does
> > > that
> > > sound reasonable?
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot 
> wrote:
> > >
> > > > Hi Jun and Anna,
> > > >
> > > > Thank you both for your replies.
> > > >
> > > > Based on our recent discussion, I agree that using a rate is better
> to
> > > > remain
> > > > consistent with other quotas. As you both suggested, it seems that
> > > changing
> > > > the way we compute the rate to better handle spiky workloads and
> > behave a
> > > > bit more similarly to the token bucket algorithm makes sense for all
> > > > quotas as
> > > > well.
> > > >
> > > > I will update the KIP to reflect this.
> > > >
> > > > Anna, I think that we can explain this in this KIP. We can't just say
> > > that
> > > > the Rate
> > > > will be updated in this KIP. I think that we need to give a bit more
> > > info.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner 
> wrote:
> > > >
> > > >> Hi Jun and David,
> > > >>
> > > >> Regarding token bucket vs, Rate behavior. We recently observed a
> > couple
> > > of
> > > >> cases where a bursty workload behavior would result in long-ish
> pauses
> > > in
> > > >> between, resulting in lower overall bandwidth than the quota. I will
> > > need
> > > >> to debug this a bit more to be 100% sure, but it does look like the
> > case
> > > >> described by David earlier in this thread. So, I agree with Jun -- I
> > > think
> > > >> we should make all quota rate behavior consistent, and probably
> > similar
> > > to
> > > >> the token bucket one.
> > > >>
> > > >> Looking at KIP-13, it doesn't describe rate calculation in enough
> > > detail,
> > > >> but does mention window size. So, we could keep "window size" and
> > > "number
> > > >> of samples" configs and change Rate implementation to be more
> similar
> > to
> > > >> token bucket:
> > > >> * number of samples define our burst size
> > > >> * Change the behavior, which could be described as: If a burst
> happens
> > > >> after an idle period, the burst would effectively spread evenly over
> > the
> > > >> (now - window) time period, where window is ( -
> 1)*
> > > >> . Which basically describes a token bucket, while
> keeping
> > > the
> > > >> current quota configs. I think we can

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread Mickael Maison

+1 (binding)
Thanks David for looking into this important issue

On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley  wrote:
>
> +1 (non binding).
>
> Thanks!
>
> On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram 
> wrote:
>
> > +1 (binding)
> >
> > Thanks for the KIP, David!
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Sun, May 31, 2020 at 3:29 AM Gwen Shapira  wrote:
> >
> > > +1 (binding)
> > >
> > > Looks great. Thank you for the in-depth design and discussion.
> > >
> > > On Fri, May 29, 2020 at 7:58 AM David Jacot  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I'd like to start the vote for KIP-599 which proposes a new quota to
> > > > throttle create topic, create partition, and delete topics operations
> > to
> > > > protect the Kafka controller:
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> > > >
> > > > Please, let me know what you think.
> > > >
> > > > Cheers,
> > > > David
> > > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Engineering Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread David Jacot

Hi all,

I just published an updated version of the KIP which includes:
* Using a slightly modified version of our Rate. I have tried to formalize
it based on our discussion. As Anna suggested, we may find a better way to
implement it.
* Handling of ValidateOnly as pointed out by Tom.

Please, check it out and let me know what you think.

Best,
David

On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley  wrote:

> Hi David,
>
> As a user I might expect the validateOnly option to do everything except
> actually make the changes. That interpretation would imply the quota should
> be checked, but the check should obviously be side-effect free. I think
> this interpretation could be useful because it gives the caller either some
> confidence that they're not going to hit the quota, or tell them, via the
> exception, when they can expect the call to work. But for this to be useful
> it would require the retry logic to not retry the request when validateOnly
> was set.
>
> On the other hand, if validateOnly is really about validating only some
> aspects of the request (which maybe is what the name implies), then we
> should clarify in the Javadoc that the quota is not included in the
> validation.
>
> On balance, I agree with what you're proposing, since the extra utility of
> including the quota in the validation seems to be not worth the extra
> complication for the retry.
>
> Thanks,
>
> Tom
>
>
>
> On Thu, Jun 4, 2020 at 3:32 PM David Jacot  wrote:
>
> > Hi Tom,
> >
> > That's a good question. As the validation does not create any load on the
> > controller, I was planning to do it without checking the quota at all.
> Does
> > that
> > sound reasonable?
> >
> > Best,
> > David
> >
> > On Thu, Jun 4, 2020 at 4:23 PM David Jacot  wrote:
> >
> > > Hi Jun and Anna,
> > >
> > > Thank you both for your replies.
> > >
> > > Based on our recent discussion, I agree that using a rate is better to
> > > remain
> > > consistent with other quotas. As you both suggested, it seems that
> > changing
> > > the way we compute the rate to better handle spiky workloads and
> behave a
> > > bit more similarly to the token bucket algorithm makes sense for all
> > > quotas as
> > > well.
> > >
> > > I will update the KIP to reflect this.
> > >
> > > Anna, I think that we can explain this in this KIP. We can't just say
> > that
> > > the Rate
> > > will be updated in this KIP. I think that we need to give a bit more
> > info.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner  wrote:
> > >
> > >> Hi Jun and David,
> > >>
> > >> Regarding token bucket vs, Rate behavior. We recently observed a
> couple
> > of
> > >> cases where a bursty workload behavior would result in long-ish pauses
> > in
> > >> between, resulting in lower overall bandwidth than the quota. I will
> > need
> > >> to debug this a bit more to be 100% sure, but it does look like the
> case
> > >> described by David earlier in this thread. So, I agree with Jun -- I
> > think
> > >> we should make all quota rate behavior consistent, and probably
> similar
> > to
> > >> the token bucket one.
> > >>
> > >> Looking at KIP-13, it doesn't describe rate calculation in enough
> > detail,
> > >> but does mention window size. So, we could keep "window size" and
> > "number
> > >> of samples" configs and change Rate implementation to be more similar
> to
> > >> token bucket:
> > >> * number of samples define our burst size
> > >> * Change the behavior, which could be described as: If a burst happens
> > >> after an idle period, the burst would effectively spread evenly over
> the
> > >> (now - window) time period, where window is ( - 1)*
> > >> . Which basically describes a token bucket, while keeping
> > the
> > >> current quota configs. I think we can even implement this by changing
> > the
> > >> way we record the last sample or lastWindowMs.
> > >>
> > >> Jun, if we would be changing Rate calculation behavior in bandwidth
> and
> > >> request quotas, would we need a separate KIP? Shouldn't need to if we
> > >> keep window size and number of samples configs, right?
> > >>
> > >> Thanks,
> > >> Anna
> > >>
> > >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:
> > >>
> > >> > Hi, David,
> > >> >
> > >> > Thanks for the reply.
> > >> >
> > >> > 11. To match the behavior in the Token bucket approach, I was
> thinking
> > >> that
> > >> > requests that don't fit in the previous time windows will be
> > >> accumulated in
> > >> > the current time window. So, the 60 extra requests will be
> accumulated
> > >> in
> > >> > the latest window. Then, the client also has to wait for 12 more
> secs
> > >> > before throttling is removed. I agree that this is probably a better
> > >> > behavior and it's reasonable to change the existing behavior to this
> > >> one.
> > >> >
> > >> > To me, it seems that sample_size * num_windows is the same as max
> > burst
> > >> > balance. The latter seems a bit better to configure. The thing is
> that
> > >> the
> > >> > existing

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread Tom Bentley

+1 (non binding).

Thanks!

On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram 
wrote:

> +1 (binding)
>
> Thanks for the KIP, David!
>
> Regards,
>
> Rajini
>
>
> On Sun, May 31, 2020 at 3:29 AM Gwen Shapira  wrote:
>
> > +1 (binding)
> >
> > Looks great. Thank you for the in-depth design and discussion.
> >
> > On Fri, May 29, 2020 at 7:58 AM David Jacot  wrote:
> >
> > > Hi folks,
> > >
> > > I'd like to start the vote for KIP-599 which proposes a new quota to
> > > throttle create topic, create partition, and delete topics operations
> to
> > > protect the Kafka controller:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> > >
> > > Please, let me know what you think.
> > >
> > > Cheers,
> > > David
> > >
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread Tom Bentley

Hi David,

As a user I might expect the validateOnly option to do everything except
actually make the changes. That interpretation would imply the quota should
be checked, but the check should obviously be side-effect free. I think
this interpretation could be useful because it gives the caller either some
confidence that they're not going to hit the quota, or tell them, via the
exception, when they can expect the call to work. But for this to be useful
it would require the retry logic to not retry the request when validateOnly
was set.

On the other hand, if validateOnly is really about validating only some
aspects of the request (which maybe is what the name implies), then we
should clarify in the Javadoc that the quota is not included in the
validation.

On balance, I agree with what you're proposing, since the extra utility of
including the quota in the validation seems to be not worth the extra
complication for the retry.

Thanks,

Tom



On Thu, Jun 4, 2020 at 3:32 PM David Jacot  wrote:

> Hi Tom,
>
> That's a good question. As the validation does not create any load on the
> controller, I was planning to do it without checking the quota at all. Does
> that
> sound reasonable?
>
> Best,
> David
>
> On Thu, Jun 4, 2020 at 4:23 PM David Jacot  wrote:
>
> > Hi Jun and Anna,
> >
> > Thank you both for your replies.
> >
> > Based on our recent discussion, I agree that using a rate is better to
> > remain
> > consistent with other quotas. As you both suggested, it seems that
> changing
> > the way we compute the rate to better handle spiky workloads and behave a
> > bit more similarly to the token bucket algorithm makes sense for all
> > quotas as
> > well.
> >
> > I will update the KIP to reflect this.
> >
> > Anna, I think that we can explain this in this KIP. We can't just say
> that
> > the Rate
> > will be updated in this KIP. I think that we need to give a bit more
> info.
> >
> > Best,
> > David
> >
> > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner  wrote:
> >
> >> Hi Jun and David,
> >>
> >> Regarding token bucket vs, Rate behavior. We recently observed a couple
> of
> >> cases where a bursty workload behavior would result in long-ish pauses
> in
> >> between, resulting in lower overall bandwidth than the quota. I will
> need
> >> to debug this a bit more to be 100% sure, but it does look like the case
> >> described by David earlier in this thread. So, I agree with Jun -- I
> think
> >> we should make all quota rate behavior consistent, and probably similar
> to
> >> the token bucket one.
> >>
> >> Looking at KIP-13, it doesn't describe rate calculation in enough
> detail,
> >> but does mention window size. So, we could keep "window size" and
> "number
> >> of samples" configs and change Rate implementation to be more similar to
> >> token bucket:
> >> * number of samples define our burst size
> >> * Change the behavior, which could be described as: If a burst happens
> >> after an idle period, the burst would effectively spread evenly over the
> >> (now - window) time period, where window is ( - 1)*
> >> . Which basically describes a token bucket, while keeping
> the
> >> current quota configs. I think we can even implement this by changing
> the
> >> way we record the last sample or lastWindowMs.
> >>
> >> Jun, if we would be changing Rate calculation behavior in bandwidth and
> >> request quotas, would we need a separate KIP? Shouldn't need to if we
> >> keep window size and number of samples configs, right?
> >>
> >> Thanks,
> >> Anna
> >>
> >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:
> >>
> >> > Hi, David,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > 11. To match the behavior in the Token bucket approach, I was thinking
> >> that
> >> > requests that don't fit in the previous time windows will be
> >> accumulated in
> >> > the current time window. So, the 60 extra requests will be accumulated
> >> in
> >> > the latest window. Then, the client also has to wait for 12 more secs
> >> > before throttling is removed. I agree that this is probably a better
> >> > behavior and it's reasonable to change the existing behavior to this
> >> one.
> >> >
> >> > To me, it seems that sample_size * num_windows is the same as max
> burst
> >> > balance. The latter seems a bit better to configure. The thing is that
> >> the
> >> > existing quota system has already been used in quite a few places and
> >> if we
> >> > want to change the configuration in the future, there is the migration
> >> > cost. Given that, do you feel it's better to adopt the  new token
> bucket
> >> > terminology or just adopt the behavior somehow into our existing
> >> system? If
> >> > it's the former, it would be useful to document this in the rejected
> >> > section and add a future plan on migrating existing quota
> >> configurations.
> >> >
> >> > Jun
> >> >
> >> >
> >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot 
> wrote:
> >> >
> >> > > Hi Jun,
> >> > >
> >> > > Thanks for your reply.
> >> > >
> >> > > 10. I think

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread David Jacot

Hi Tom,

That's a good question. As the validation does not create any load on the
controller, I was planning to do it without checking the quota at all. Does
that
sound reasonable?

Best,
David

On Thu, Jun 4, 2020 at 4:23 PM David Jacot  wrote:

> Hi Jun and Anna,
>
> Thank you both for your replies.
>
> Based on our recent discussion, I agree that using a rate is better to
> remain
> consistent with other quotas. As you both suggested, it seems that changing
> the way we compute the rate to better handle spiky workloads and behave a
> bit more similarly to the token bucket algorithm makes sense for all
> quotas as
> well.
>
> I will update the KIP to reflect this.
>
> Anna, I think that we can explain this in this KIP. We can't just say that
> the Rate
> will be updated in this KIP. I think that we need to give a bit more info.
>
> Best,
> David
>
> On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner  wrote:
>
>> Hi Jun and David,
>>
>> Regarding token bucket vs, Rate behavior. We recently observed a couple of
>> cases where a bursty workload behavior would result in long-ish pauses in
>> between, resulting in lower overall bandwidth than the quota. I will need
>> to debug this a bit more to be 100% sure, but it does look like the case
>> described by David earlier in this thread. So, I agree with Jun -- I think
>> we should make all quota rate behavior consistent, and probably similar to
>> the token bucket one.
>>
>> Looking at KIP-13, it doesn't describe rate calculation in enough detail,
>> but does mention window size. So, we could keep "window size" and "number
>> of samples" configs and change Rate implementation to be more similar to
>> token bucket:
>> * number of samples define our burst size
>> * Change the behavior, which could be described as: If a burst happens
>> after an idle period, the burst would effectively spread evenly over the
>> (now - window) time period, where window is ( - 1)*
>> . Which basically describes a token bucket, while keeping the
>> current quota configs. I think we can even implement this by changing the
>> way we record the last sample or lastWindowMs.
>>
>> Jun, if we would be changing Rate calculation behavior in bandwidth and
>> request quotas, would we need a separate KIP? Shouldn't need to if we
>> keep window size and number of samples configs, right?
>>
>> Thanks,
>> Anna
>>
>> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:
>>
>> > Hi, David,
>> >
>> > Thanks for the reply.
>> >
>> > 11. To match the behavior in the Token bucket approach, I was thinking
>> that
>> > requests that don't fit in the previous time windows will be
>> accumulated in
>> > the current time window. So, the 60 extra requests will be accumulated
>> in
>> > the latest window. Then, the client also has to wait for 12 more secs
>> > before throttling is removed. I agree that this is probably a better
>> > behavior and it's reasonable to change the existing behavior to this
>> one.
>> >
>> > To me, it seems that sample_size * num_windows is the same as max burst
>> > balance. The latter seems a bit better to configure. The thing is that
>> the
>> > existing quota system has already been used in quite a few places and
>> if we
>> > want to change the configuration in the future, there is the migration
>> > cost. Given that, do you feel it's better to adopt the  new token bucket
>> > terminology or just adopt the behavior somehow into our existing
>> system? If
>> > it's the former, it would be useful to document this in the rejected
>> > section and add a future plan on migrating existing quota
>> configurations.
>> >
>> > Jun
>> >
>> >
>> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:
>> >
>> > > Hi Jun,
>> > >
>> > > Thanks for your reply.
>> > >
>> > > 10. I think that both options are likely equivalent from an accuracy
>> > point
>> > > of
>> > > view. If we put the implementation aside, conceptually, I am not
>> > convinced
>> > > by the used based approach because resources don't have a clear owner
>> > > in AK at the moment. A topic can be created by (Principal A, no client
>> > id),
>> > > then can be extended by (no principal, Client B), and finally deleted
>> by
>> > > (Principal C, Client C). This does not sound right to me and I fear
>> that
>> > it
>> > > is not going to be easy to grasp for our users.
>> > >
>> > > Regarding the naming, I do agree that we can make it more future
>> proof.
>> > > I propose `controller_mutations_rate`. I think that differentiating
>> the
>> > > mutations
>> > > from the reads is still a good thing for the future.
>> > >
>> > > 11. I am not convinced by your proposal for the following reasons:
>> > >
>> > > First, in my toy example, I used 101 windows and 7 * 80 requests. We
>> > could
>> > > effectively allocate 5 * 100 requests to the previous windows assuming
>> > that
>> > > they are empty. What shall we do with the remaining 60 requests?
>> Shall we
>> > > allocate them to the current window or shall we divide it among all
>> the
>> > >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-04 Thread David Jacot

Hi Jun and Anna,

Thank you both for your replies.

Based on our recent discussion, I agree that using a rate is better to
remain
consistent with other quotas. As you both suggested, it seems that changing
the way we compute the rate to better handle spiky workloads and behave a
bit more similarly to the token bucket algorithm makes sense for all quotas
as
well.

I will update the KIP to reflect this.

Anna, I think that we can explain this in this KIP. We can't just say that
the Rate
will be updated in this KIP. I think that we need to give a bit more info.

Best,
David

On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner  wrote:

> Hi Jun and David,
>
> Regarding token bucket vs, Rate behavior. We recently observed a couple of
> cases where a bursty workload behavior would result in long-ish pauses in
> between, resulting in lower overall bandwidth than the quota. I will need
> to debug this a bit more to be 100% sure, but it does look like the case
> described by David earlier in this thread. So, I agree with Jun -- I think
> we should make all quota rate behavior consistent, and probably similar to
> the token bucket one.
>
> Looking at KIP-13, it doesn't describe rate calculation in enough detail,
> but does mention window size. So, we could keep "window size" and "number
> of samples" configs and change Rate implementation to be more similar to
> token bucket:
> * number of samples define our burst size
> * Change the behavior, which could be described as: If a burst happens
> after an idle period, the burst would effectively spread evenly over the
> (now - window) time period, where window is ( - 1)*
> . Which basically describes a token bucket, while keeping the
> current quota configs. I think we can even implement this by changing the
> way we record the last sample or lastWindowMs.
>
> Jun, if we would be changing Rate calculation behavior in bandwidth and
> request quotas, would we need a separate KIP? Shouldn't need to if we
> keep window size and number of samples configs, right?
>
> Thanks,
> Anna
>
> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply.
> >
> > 11. To match the behavior in the Token bucket approach, I was thinking
> that
> > requests that don't fit in the previous time windows will be accumulated
> in
> > the current time window. So, the 60 extra requests will be accumulated in
> > the latest window. Then, the client also has to wait for 12 more secs
> > before throttling is removed. I agree that this is probably a better
> > behavior and it's reasonable to change the existing behavior to this one.
> >
> > To me, it seems that sample_size * num_windows is the same as max burst
> > balance. The latter seems a bit better to configure. The thing is that
> the
> > existing quota system has already been used in quite a few places and if
> we
> > want to change the configuration in the future, there is the migration
> > cost. Given that, do you feel it's better to adopt the  new token bucket
> > terminology or just adopt the behavior somehow into our existing system?
> If
> > it's the former, it would be useful to document this in the rejected
> > section and add a future plan on migrating existing quota configurations.
> >
> > Jun
> >
> >
> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for your reply.
> > >
> > > 10. I think that both options are likely equivalent from an accuracy
> > point
> > > of
> > > view. If we put the implementation aside, conceptually, I am not
> > convinced
> > > by the used based approach because resources don't have a clear owner
> > > in AK at the moment. A topic can be created by (Principal A, no client
> > id),
> > > then can be extended by (no principal, Client B), and finally deleted
> by
> > > (Principal C, Client C). This does not sound right to me and I fear
> that
> > it
> > > is not going to be easy to grasp for our users.
> > >
> > > Regarding the naming, I do agree that we can make it more future proof.
> > > I propose `controller_mutations_rate`. I think that differentiating the
> > > mutations
> > > from the reads is still a good thing for the future.
> > >
> > > 11. I am not convinced by your proposal for the following reasons:
> > >
> > > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> > could
> > > effectively allocate 5 * 100 requests to the previous windows assuming
> > that
> > > they are empty. What shall we do with the remaining 60 requests? Shall
> we
> > > allocate them to the current window or shall we divide it among all the
> > > windows?
> > >
> > > Second, I don't think that we can safely change the behavior of all the
> > > existing
> > > rates used because it actually changes the computation of the rate as
> > > values
> > > allocated to past windows would expire before they would today.
> > >
> > > Overall, while trying to fit in the current rate, we are going to
> build a
> > > slightly
> > > different version of the

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Anna Povzner

Hi Jun and David,

Regarding token bucket vs, Rate behavior. We recently observed a couple of
cases where a bursty workload behavior would result in long-ish pauses in
between, resulting in lower overall bandwidth than the quota. I will need
to debug this a bit more to be 100% sure, but it does look like the case
described by David earlier in this thread. So, I agree with Jun -- I think
we should make all quota rate behavior consistent, and probably similar to
the token bucket one.

Looking at KIP-13, it doesn't describe rate calculation in enough detail,
but does mention window size. So, we could keep "window size" and "number
of samples" configs and change Rate implementation to be more similar to
token bucket:
* number of samples define our burst size
* Change the behavior, which could be described as: If a burst happens
after an idle period, the burst would effectively spread evenly over the
(now - window) time period, where window is ( - 1)*
. Which basically describes a token bucket, while keeping the
current quota configs. I think we can even implement this by changing the
way we record the last sample or lastWindowMs.

Jun, if we would be changing Rate calculation behavior in bandwidth and
request quotas, would we need a separate KIP? Shouldn't need to if we
keep window size and number of samples configs, right?

Thanks,
Anna

On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 11. To match the behavior in the Token bucket approach, I was thinking that
> requests that don't fit in the previous time windows will be accumulated in
> the current time window. So, the 60 extra requests will be accumulated in
> the latest window. Then, the client also has to wait for 12 more secs
> before throttling is removed. I agree that this is probably a better
> behavior and it's reasonable to change the existing behavior to this one.
>
> To me, it seems that sample_size * num_windows is the same as max burst
> balance. The latter seems a bit better to configure. The thing is that the
> existing quota system has already been used in quite a few places and if we
> want to change the configuration in the future, there is the migration
> cost. Given that, do you feel it's better to adopt the  new token bucket
> terminology or just adopt the behavior somehow into our existing system? If
> it's the former, it would be useful to document this in the rejected
> section and add a future plan on migrating existing quota configurations.
>
> Jun
>
>
> On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:
>
> > Hi Jun,
> >
> > Thanks for your reply.
> >
> > 10. I think that both options are likely equivalent from an accuracy
> point
> > of
> > view. If we put the implementation aside, conceptually, I am not
> convinced
> > by the used based approach because resources don't have a clear owner
> > in AK at the moment. A topic can be created by (Principal A, no client
> id),
> > then can be extended by (no principal, Client B), and finally deleted by
> > (Principal C, Client C). This does not sound right to me and I fear that
> it
> > is not going to be easy to grasp for our users.
> >
> > Regarding the naming, I do agree that we can make it more future proof.
> > I propose `controller_mutations_rate`. I think that differentiating the
> > mutations
> > from the reads is still a good thing for the future.
> >
> > 11. I am not convinced by your proposal for the following reasons:
> >
> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> could
> > effectively allocate 5 * 100 requests to the previous windows assuming
> that
> > they are empty. What shall we do with the remaining 60 requests? Shall we
> > allocate them to the current window or shall we divide it among all the
> > windows?
> >
> > Second, I don't think that we can safely change the behavior of all the
> > existing
> > rates used because it actually changes the computation of the rate as
> > values
> > allocated to past windows would expire before they would today.
> >
> > Overall, while trying to fit in the current rate, we are going to build a
> > slightly
> > different version of the rate which will be even more confusing for
> users.
> >
> > Instead, I think that we should embrace the notion of burst as it could
> > also
> > be applied to other quotas in the future. Users don't have to know that
> we
> > use the Token Bucket or a special rate inside at the end of the day. It
> is
> > an
> > implementation detail.
> >
> > Users would be able to define:
> > - a rate R; and
> > - a maximum burst B.
> >
> > If we change the metrics to be as follow:
> > - the actual rate;
> > - the burst balance in %, 0 meaning that the user is throttled;
> > It remains disattach from the algorithm.
> >
> > I personally prefer this over having to define a rate and a number of
> > windows
> > while having to understand that the number of windows implicitly defines
> > the
> > allowed burst. I think that it is clearer and easier to

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Jun Rao

Hi, David,

Thanks for the reply.

11. To match the behavior in the Token bucket approach, I was thinking that
requests that don't fit in the previous time windows will be accumulated in
the current time window. So, the 60 extra requests will be accumulated in
the latest window. Then, the client also has to wait for 12 more secs
before throttling is removed. I agree that this is probably a better
behavior and it's reasonable to change the existing behavior to this one.

To me, it seems that sample_size * num_windows is the same as max burst
balance. The latter seems a bit better to configure. The thing is that the
existing quota system has already been used in quite a few places and if we
want to change the configuration in the future, there is the migration
cost. Given that, do you feel it's better to adopt the  new token bucket
terminology or just adopt the behavior somehow into our existing system? If
it's the former, it would be useful to document this in the rejected
section and add a future plan on migrating existing quota configurations.

Jun


On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:

> Hi Jun,
>
> Thanks for your reply.
>
> 10. I think that both options are likely equivalent from an accuracy point
> of
> view. If we put the implementation aside, conceptually, I am not convinced
> by the used based approach because resources don't have a clear owner
> in AK at the moment. A topic can be created by (Principal A, no client id),
> then can be extended by (no principal, Client B), and finally deleted by
> (Principal C, Client C). This does not sound right to me and I fear that it
> is not going to be easy to grasp for our users.
>
> Regarding the naming, I do agree that we can make it more future proof.
> I propose `controller_mutations_rate`. I think that differentiating the
> mutations
> from the reads is still a good thing for the future.
>
> 11. I am not convinced by your proposal for the following reasons:
>
> First, in my toy example, I used 101 windows and 7 * 80 requests. We could
> effectively allocate 5 * 100 requests to the previous windows assuming that
> they are empty. What shall we do with the remaining 60 requests? Shall we
> allocate them to the current window or shall we divide it among all the
> windows?
>
> Second, I don't think that we can safely change the behavior of all the
> existing
> rates used because it actually changes the computation of the rate as
> values
> allocated to past windows would expire before they would today.
>
> Overall, while trying to fit in the current rate, we are going to build a
> slightly
> different version of the rate which will be even more confusing for users.
>
> Instead, I think that we should embrace the notion of burst as it could
> also
> be applied to other quotas in the future. Users don't have to know that we
> use the Token Bucket or a special rate inside at the end of the day. It is
> an
> implementation detail.
>
> Users would be able to define:
> - a rate R; and
> - a maximum burst B.
>
> If we change the metrics to be as follow:
> - the actual rate;
> - the burst balance in %, 0 meaning that the user is throttled;
> It remains disattach from the algorithm.
>
> I personally prefer this over having to define a rate and a number of
> windows
> while having to understand that the number of windows implicitly defines
> the
> allowed burst. I think that it is clearer and easier to grasp. Don't you?
>
> Best,
> David
>
> On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:
>
> > Hi, David, Anna,
> >
> > Thanks for the response. Sorry for the late reply.
> >
> > 10. Regarding exposing rate or usage as quota. Your argument is that
> usage
> > is not very accurate anyway and is harder to implement. So, let's just
> be a
> > bit loose and expose rate. I am sort of neutral on that. (1) It seems to
> me
> > that overall usage will be relatively more accurate than rate. All the
> > issues that Anna brought up seem to exist for rate too. Rate has the
> > additional problem that the cost of each request may not be uniform. (2)
> In
> > terms of implementation, a usage based approach requires tracking the
> user
> > info through the life cycle of an operation. However, as you mentioned,
> > things like topic creation can generate additional
> > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
> > associate those cost to the user who initiated the operation. If we do
> > that, we sort of need to track the user for the full life cycle of the
> > processing of an operation anyway. (3) If you prefer rate strongly, I
> don't
> > have strong objections. However, I do feel that the new quota name should
> > be able to cover all controller related cost longer term. This KIP
> > currently only covers topic creation/deletion. It would not be ideal if
> in
> > the future, we have to add yet another type of quota for some other
> > controller related operations. The quota name in the KIP has partition
> > mutation. In the future, if we

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Tom Bentley

Hi David,

One quick question about the implementation (I don't think it's spelled out
in the KIP): Presumably if you make, for example, a create topics request
with validate only it will check for quota violation, but not count towards
quota violation, right?

Many thanks,

Tom

On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram 
wrote:

> Hi David,
>
> 2) sorry, that was my mistake.
>
> Regards,
>
> Rajini
>
>
> On Wed, Jun 3, 2020 at 3:08 PM David Jacot  wrote:
>
> > Hi Rajini,
> >
> > Thanks for your prompt response.
> > 1) Good catch, fixed.
> > 2) The retry mechanism will be in the client so a new field is not
> > required in the requests.
> >
> > Regards,
> > David
> >
> > On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
> > wrote:
> >
> > > Hi David,
> > >
> > > Thanks for the updates, looks good. Just a couple of minor comments:
> > > 1) There is a typo in "*The channel will be mutated as well when
> > > `throttle_time_ms > 0`." * Should be *muted*?
> > > 2) Since the three requests will need a new field for `
> > > *retryQuotaViolatedException*`, we should perhaps add that change to
> the
> > > KIP.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Wed, Jun 3, 2020 at 1:02 PM David Jacot 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have updated the KIP based on our recent discussions. I have mainly
> > > > changed the
> > > > following points:
> > > > * I have renamed the quota as suggested by Jun.
> > > > * I have changed the metrics to be "token bucket" agnostic. The idea
> is
> > > to
> > > > report the
> > > > burst and the rate per principal/clientid.
> > > > * I have removed the `retry.quota.violation.exception` configuration
> > and
> > > > replaced it
> > > > with options in the respective methods' options.
> > > >
> > > > Now, the public interfaces are not tight to the algorithm that we use
> > > > internally to throttle
> > > > the requests but keep the notion of burst. I hope that this will help
> > to
> > > > alleviate the
> > > > tokens bucket vs rate discussions.
> > > >
> > > > Please, have a look and let me know your thoughts.
> > > >
> > > > Bests,
> > > > David
> > > >
> > > >
> > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot 
> > wrote:
> > > >
> > > > > Hi Rajini,
> > > > >
> > > > > Thanks for your feedback. Please find my answers below:
> > > > >
> > > > > 1) Our main goal is to protect the controller from the extreme
> users
> > > > > (DDoS). We want
> > > > > to protect it from large requests or repetitive requests coming
> from
> > a
> > > > > single user.
> > > > > That user could be used by multiple apps as you pointed out which
> > makes
> > > > it
> > > > > even
> > > > > worst. For instance, a buggy application could continuously create
> > and
> > > > > delete
> > > > > topics in a tight loop.
> > > > >
> > > > > The idea of the burst is to still allow creating or deleting topics
> > in
> > > > > batch because
> > > > > this is how applications tend to do it. However, we want to keep
> the
> > > > batch
> > > > > size
> > > > > under control with the burst. The burst does not allow requests of
> > any
> > > > > size. Topics
> > > > > are accepted until the burst is passed. All the others are
> rejected.
> > > > > Ideally, regular
> > > > > and well behaving applications should never or rarely be throttled.
> > > > >
> > > > > 2) No, there is no explicit bound on the maximum throttle time.
> > Having
> > > a
> > > > > maximum
> > > > > is straightforward here because the throttling time depends on the
> > > actual
> > > > > size of
> > > > > the request.
> > > > >
> > > > > 3) That's a very good question that I haven't thought of. I was
> > > thinking
> > > > > about doing
> > > > > it for the new quota only. I think that your suggestion of having a
> > per
> > > > > method argument
> > > > > makes sense. I will update the KIP.
> > > > >
> > > > > 4) Indeed, it is outdated text. Let me update that.
> > > > >
> > > > > Regards,
> > > > > David
> > > > >
> > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi David,
> > > > >>
> > > > >> Thanks for the KIP. A few questions below:
> > > > >>
> > > > >> 1) The KIP says: *`Typically, applications tend to send one
> request
> > to
> > > > >> create all the topics that they need`*. What would the point of
> > > > throttling
> > > > >> be in this case? If there was a user quota for the principal used
> by
> > > > that
> > > > >> application, wouldn't we just allow the request due to the burst
> > > value?
> > > > Is
> > > > >> the KIP specifically for the case where multiple apps with the
> same
> > > user
> > > > >> principal are started all at once?
> > > > >>
> > > > >> 2) Will there be a bound on the maximum throttle time?
> > > > >>
> > > > >> 3) Will *retry.quota.violation.exception* have any impact on
> request
> > > > quota
> > > > >> throttling or is it limited to the three requests where the new
> > quota
> > > is
> > > > >> applied?

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram

+1 (binding)

Thanks for the KIP, David!

Regards,

Rajini


On Sun, May 31, 2020 at 3:29 AM Gwen Shapira  wrote:

> +1 (binding)
>
> Looks great. Thank you for the in-depth design and discussion.
>
> On Fri, May 29, 2020 at 7:58 AM David Jacot  wrote:
>
> > Hi folks,
> >
> > I'd like to start the vote for KIP-599 which proposes a new quota to
> > throttle create topic, create partition, and delete topics operations to
> > protect the Kafka controller:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> >
> > Please, let me know what you think.
> >
> > Cheers,
> > David
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram

Hi David,

2) sorry, that was my mistake.

Regards,

Rajini


On Wed, Jun 3, 2020 at 3:08 PM David Jacot  wrote:

> Hi Rajini,
>
> Thanks for your prompt response.
> 1) Good catch, fixed.
> 2) The retry mechanism will be in the client so a new field is not
> required in the requests.
>
> Regards,
> David
>
> On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
> wrote:
>
> > Hi David,
> >
> > Thanks for the updates, looks good. Just a couple of minor comments:
> > 1) There is a typo in "*The channel will be mutated as well when
> > `throttle_time_ms > 0`." * Should be *muted*?
> > 2) Since the three requests will need a new field for `
> > *retryQuotaViolatedException*`, we should perhaps add that change to the
> > KIP.
> >
> > Regards,
> >
> > Rajini
> >
> > On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:
> >
> > > Hi all,
> > >
> > > I have updated the KIP based on our recent discussions. I have mainly
> > > changed the
> > > following points:
> > > * I have renamed the quota as suggested by Jun.
> > > * I have changed the metrics to be "token bucket" agnostic. The idea is
> > to
> > > report the
> > > burst and the rate per principal/clientid.
> > > * I have removed the `retry.quota.violation.exception` configuration
> and
> > > replaced it
> > > with options in the respective methods' options.
> > >
> > > Now, the public interfaces are not tight to the algorithm that we use
> > > internally to throttle
> > > the requests but keep the notion of burst. I hope that this will help
> to
> > > alleviate the
> > > tokens bucket vs rate discussions.
> > >
> > > Please, have a look and let me know your thoughts.
> > >
> > > Bests,
> > > David
> > >
> > >
> > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot 
> wrote:
> > >
> > > > Hi Rajini,
> > > >
> > > > Thanks for your feedback. Please find my answers below:
> > > >
> > > > 1) Our main goal is to protect the controller from the extreme users
> > > > (DDoS). We want
> > > > to protect it from large requests or repetitive requests coming from
> a
> > > > single user.
> > > > That user could be used by multiple apps as you pointed out which
> makes
> > > it
> > > > even
> > > > worst. For instance, a buggy application could continuously create
> and
> > > > delete
> > > > topics in a tight loop.
> > > >
> > > > The idea of the burst is to still allow creating or deleting topics
> in
> > > > batch because
> > > > this is how applications tend to do it. However, we want to keep the
> > > batch
> > > > size
> > > > under control with the burst. The burst does not allow requests of
> any
> > > > size. Topics
> > > > are accepted until the burst is passed. All the others are rejected.
> > > > Ideally, regular
> > > > and well behaving applications should never or rarely be throttled.
> > > >
> > > > 2) No, there is no explicit bound on the maximum throttle time.
> Having
> > a
> > > > maximum
> > > > is straightforward here because the throttling time depends on the
> > actual
> > > > size of
> > > > the request.
> > > >
> > > > 3) That's a very good question that I haven't thought of. I was
> > thinking
> > > > about doing
> > > > it for the new quota only. I think that your suggestion of having a
> per
> > > > method argument
> > > > makes sense. I will update the KIP.
> > > >
> > > > 4) Indeed, it is outdated text. Let me update that.
> > > >
> > > > Regards,
> > > > David
> > > >
> > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi David,
> > > >>
> > > >> Thanks for the KIP. A few questions below:
> > > >>
> > > >> 1) The KIP says: *`Typically, applications tend to send one request
> to
> > > >> create all the topics that they need`*. What would the point of
> > > throttling
> > > >> be in this case? If there was a user quota for the principal used by
> > > that
> > > >> application, wouldn't we just allow the request due to the burst
> > value?
> > > Is
> > > >> the KIP specifically for the case where multiple apps with the same
> > user
> > > >> principal are started all at once?
> > > >>
> > > >> 2) Will there be a bound on the maximum throttle time?
> > > >>
> > > >> 3) Will *retry.quota.violation.exception* have any impact on request
> > > quota
> > > >> throttling or is it limited to the three requests where the new
> quota
> > is
> > > >> applied? If it is only for the new quota, perhaps it would be better
> > to
> > > >> add
> > > >> an option to the relevant requests rather than use an admin client
> > > config.
> > > >>
> > > >> 4) Is this outdated text, wasn't sure what this refers to : *While
> the
> > > >> configuration could be set by broker, we envision to have it set
> only
> > > has
> > > >> a
> > > >> cluster wide default for two reasons.*
> > > >>
> > > >> Regards,
> > > >>
> > > >> Rajini
> > > >>
> > > >>
> > > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot 
> > wrote:
> > > >>
> > > >> > Hi Jun,
> > > >> >
> > > >> > Thanks for your reply.
> > > >> >
> > > >> > 10. I think that

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot

Hi Rajini,

Thanks for your prompt response.
1) Good catch, fixed.
2) The retry mechanism will be in the client so a new field is not
required in the requests.

Regards,
David

On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
wrote:

> Hi David,
>
> Thanks for the updates, looks good. Just a couple of minor comments:
> 1) There is a typo in "*The channel will be mutated as well when
> `throttle_time_ms > 0`." * Should be *muted*?
> 2) Since the three requests will need a new field for `
> *retryQuotaViolatedException*`, we should perhaps add that change to the
> KIP.
>
> Regards,
>
> Rajini
>
> On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:
>
> > Hi all,
> >
> > I have updated the KIP based on our recent discussions. I have mainly
> > changed the
> > following points:
> > * I have renamed the quota as suggested by Jun.
> > * I have changed the metrics to be "token bucket" agnostic. The idea is
> to
> > report the
> > burst and the rate per principal/clientid.
> > * I have removed the `retry.quota.violation.exception` configuration and
> > replaced it
> > with options in the respective methods' options.
> >
> > Now, the public interfaces are not tight to the algorithm that we use
> > internally to throttle
> > the requests but keep the notion of burst. I hope that this will help to
> > alleviate the
> > tokens bucket vs rate discussions.
> >
> > Please, have a look and let me know your thoughts.
> >
> > Bests,
> > David
> >
> >
> > On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:
> >
> > > Hi Rajini,
> > >
> > > Thanks for your feedback. Please find my answers below:
> > >
> > > 1) Our main goal is to protect the controller from the extreme users
> > > (DDoS). We want
> > > to protect it from large requests or repetitive requests coming from a
> > > single user.
> > > That user could be used by multiple apps as you pointed out which makes
> > it
> > > even
> > > worst. For instance, a buggy application could continuously create and
> > > delete
> > > topics in a tight loop.
> > >
> > > The idea of the burst is to still allow creating or deleting topics in
> > > batch because
> > > this is how applications tend to do it. However, we want to keep the
> > batch
> > > size
> > > under control with the burst. The burst does not allow requests of any
> > > size. Topics
> > > are accepted until the burst is passed. All the others are rejected.
> > > Ideally, regular
> > > and well behaving applications should never or rarely be throttled.
> > >
> > > 2) No, there is no explicit bound on the maximum throttle time. Having
> a
> > > maximum
> > > is straightforward here because the throttling time depends on the
> actual
> > > size of
> > > the request.
> > >
> > > 3) That's a very good question that I haven't thought of. I was
> thinking
> > > about doing
> > > it for the new quota only. I think that your suggestion of having a per
> > > method argument
> > > makes sense. I will update the KIP.
> > >
> > > 4) Indeed, it is outdated text. Let me update that.
> > >
> > > Regards,
> > > David
> > >
> > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > >> Hi David,
> > >>
> > >> Thanks for the KIP. A few questions below:
> > >>
> > >> 1) The KIP says: *`Typically, applications tend to send one request to
> > >> create all the topics that they need`*. What would the point of
> > throttling
> > >> be in this case? If there was a user quota for the principal used by
> > that
> > >> application, wouldn't we just allow the request due to the burst
> value?
> > Is
> > >> the KIP specifically for the case where multiple apps with the same
> user
> > >> principal are started all at once?
> > >>
> > >> 2) Will there be a bound on the maximum throttle time?
> > >>
> > >> 3) Will *retry.quota.violation.exception* have any impact on request
> > quota
> > >> throttling or is it limited to the three requests where the new quota
> is
> > >> applied? If it is only for the new quota, perhaps it would be better
> to
> > >> add
> > >> an option to the relevant requests rather than use an admin client
> > config.
> > >>
> > >> 4) Is this outdated text, wasn't sure what this refers to : *While the
> > >> configuration could be set by broker, we envision to have it set only
> > has
> > >> a
> > >> cluster wide default for two reasons.*
> > >>
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> > >>
> > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot 
> wrote:
> > >>
> > >> > Hi Jun,
> > >> >
> > >> > Thanks for your reply.
> > >> >
> > >> > 10. I think that both options are likely equivalent from an accuracy
> > >> point
> > >> > of
> > >> > view. If we put the implementation aside, conceptually, I am not
> > >> convinced
> > >> > by the used based approach because resources don't have a clear
> owner
> > >> > in AK at the moment. A topic can be created by (Principal A, no
> client
> > >> id),
> > >> > then can be extended by (no principal, Client B), and finally
> deleted
> > by
> > >> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram

Hi David,

Thanks for the updates, looks good. Just a couple of minor comments:
1) There is a typo in "*The channel will be mutated as well when
`throttle_time_ms > 0`." * Should be *muted*?
2) Since the three requests will need a new field for `
*retryQuotaViolatedException*`, we should perhaps add that change to the
KIP.

Regards,

Rajini

On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:

> Hi all,
>
> I have updated the KIP based on our recent discussions. I have mainly
> changed the
> following points:
> * I have renamed the quota as suggested by Jun.
> * I have changed the metrics to be "token bucket" agnostic. The idea is to
> report the
> burst and the rate per principal/clientid.
> * I have removed the `retry.quota.violation.exception` configuration and
> replaced it
> with options in the respective methods' options.
>
> Now, the public interfaces are not tight to the algorithm that we use
> internally to throttle
> the requests but keep the notion of burst. I hope that this will help to
> alleviate the
> tokens bucket vs rate discussions.
>
> Please, have a look and let me know your thoughts.
>
> Bests,
> David
>
>
> On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:
>
> > Hi Rajini,
> >
> > Thanks for your feedback. Please find my answers below:
> >
> > 1) Our main goal is to protect the controller from the extreme users
> > (DDoS). We want
> > to protect it from large requests or repetitive requests coming from a
> > single user.
> > That user could be used by multiple apps as you pointed out which makes
> it
> > even
> > worst. For instance, a buggy application could continuously create and
> > delete
> > topics in a tight loop.
> >
> > The idea of the burst is to still allow creating or deleting topics in
> > batch because
> > this is how applications tend to do it. However, we want to keep the
> batch
> > size
> > under control with the burst. The burst does not allow requests of any
> > size. Topics
> > are accepted until the burst is passed. All the others are rejected.
> > Ideally, regular
> > and well behaving applications should never or rarely be throttled.
> >
> > 2) No, there is no explicit bound on the maximum throttle time. Having a
> > maximum
> > is straightforward here because the throttling time depends on the actual
> > size of
> > the request.
> >
> > 3) That's a very good question that I haven't thought of. I was thinking
> > about doing
> > it for the new quota only. I think that your suggestion of having a per
> > method argument
> > makes sense. I will update the KIP.
> >
> > 4) Indeed, it is outdated text. Let me update that.
> >
> > Regards,
> > David
> >
> > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
> > wrote:
> >
> >> Hi David,
> >>
> >> Thanks for the KIP. A few questions below:
> >>
> >> 1) The KIP says: *`Typically, applications tend to send one request to
> >> create all the topics that they need`*. What would the point of
> throttling
> >> be in this case? If there was a user quota for the principal used by
> that
> >> application, wouldn't we just allow the request due to the burst value?
> Is
> >> the KIP specifically for the case where multiple apps with the same user
> >> principal are started all at once?
> >>
> >> 2) Will there be a bound on the maximum throttle time?
> >>
> >> 3) Will *retry.quota.violation.exception* have any impact on request
> quota
> >> throttling or is it limited to the three requests where the new quota is
> >> applied? If it is only for the new quota, perhaps it would be better to
> >> add
> >> an option to the relevant requests rather than use an admin client
> config.
> >>
> >> 4) Is this outdated text, wasn't sure what this refers to : *While the
> >> configuration could be set by broker, we envision to have it set only
> has
> >> a
> >> cluster wide default for two reasons.*
> >>
> >> Regards,
> >>
> >> Rajini
> >>
> >>
> >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thanks for your reply.
> >> >
> >> > 10. I think that both options are likely equivalent from an accuracy
> >> point
> >> > of
> >> > view. If we put the implementation aside, conceptually, I am not
> >> convinced
> >> > by the used based approach because resources don't have a clear owner
> >> > in AK at the moment. A topic can be created by (Principal A, no client
> >> id),
> >> > then can be extended by (no principal, Client B), and finally deleted
> by
> >> > (Principal C, Client C). This does not sound right to me and I fear
> >> that it
> >> > is not going to be easy to grasp for our users.
> >> >
> >> > Regarding the naming, I do agree that we can make it more future
> proof.
> >> > I propose `controller_mutations_rate`. I think that differentiating
> the
> >> > mutations
> >> > from the reads is still a good thing for the future.
> >> >
> >> > 11. I am not convinced by your proposal for the following reasons:
> >> >
> >> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> >> could
> >> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot

Hi all,

I have updated the KIP based on our recent discussions. I have mainly
changed the
following points:
* I have renamed the quota as suggested by Jun.
* I have changed the metrics to be "token bucket" agnostic. The idea is to
report the
burst and the rate per principal/clientid.
* I have removed the `retry.quota.violation.exception` configuration and
replaced it
with options in the respective methods' options.

Now, the public interfaces are not tight to the algorithm that we use
internally to throttle
the requests but keep the notion of burst. I hope that this will help to
alleviate the
tokens bucket vs rate discussions.

Please, have a look and let me know your thoughts.

Bests,
David


On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:

> Hi Rajini,
>
> Thanks for your feedback. Please find my answers below:
>
> 1) Our main goal is to protect the controller from the extreme users
> (DDoS). We want
> to protect it from large requests or repetitive requests coming from a
> single user.
> That user could be used by multiple apps as you pointed out which makes it
> even
> worst. For instance, a buggy application could continuously create and
> delete
> topics in a tight loop.
>
> The idea of the burst is to still allow creating or deleting topics in
> batch because
> this is how applications tend to do it. However, we want to keep the batch
> size
> under control with the burst. The burst does not allow requests of any
> size. Topics
> are accepted until the burst is passed. All the others are rejected.
> Ideally, regular
> and well behaving applications should never or rarely be throttled.
>
> 2) No, there is no explicit bound on the maximum throttle time. Having a
> maximum
> is straightforward here because the throttling time depends on the actual
> size of
> the request.
>
> 3) That's a very good question that I haven't thought of. I was thinking
> about doing
> it for the new quota only. I think that your suggestion of having a per
> method argument
> makes sense. I will update the KIP.
>
> 4) Indeed, it is outdated text. Let me update that.
>
> Regards,
> David
>
> On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
> wrote:
>
>> Hi David,
>>
>> Thanks for the KIP. A few questions below:
>>
>> 1) The KIP says: *`Typically, applications tend to send one request to
>> create all the topics that they need`*. What would the point of throttling
>> be in this case? If there was a user quota for the principal used by that
>> application, wouldn't we just allow the request due to the burst value? Is
>> the KIP specifically for the case where multiple apps with the same user
>> principal are started all at once?
>>
>> 2) Will there be a bound on the maximum throttle time?
>>
>> 3) Will *retry.quota.violation.exception* have any impact on request quota
>> throttling or is it limited to the three requests where the new quota is
>> applied? If it is only for the new quota, perhaps it would be better to
>> add
>> an option to the relevant requests rather than use an admin client config.
>>
>> 4) Is this outdated text, wasn't sure what this refers to : *While the
>> configuration could be set by broker, we envision to have it set only has
>> a
>> cluster wide default for two reasons.*
>>
>> Regards,
>>
>> Rajini
>>
>>
>> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
>>
>> > Hi Jun,
>> >
>> > Thanks for your reply.
>> >
>> > 10. I think that both options are likely equivalent from an accuracy
>> point
>> > of
>> > view. If we put the implementation aside, conceptually, I am not
>> convinced
>> > by the used based approach because resources don't have a clear owner
>> > in AK at the moment. A topic can be created by (Principal A, no client
>> id),
>> > then can be extended by (no principal, Client B), and finally deleted by
>> > (Principal C, Client C). This does not sound right to me and I fear
>> that it
>> > is not going to be easy to grasp for our users.
>> >
>> > Regarding the naming, I do agree that we can make it more future proof.
>> > I propose `controller_mutations_rate`. I think that differentiating the
>> > mutations
>> > from the reads is still a good thing for the future.
>> >
>> > 11. I am not convinced by your proposal for the following reasons:
>> >
>> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
>> could
>> > effectively allocate 5 * 100 requests to the previous windows assuming
>> that
>> > they are empty. What shall we do with the remaining 60 requests? Shall
>> we
>> > allocate them to the current window or shall we divide it among all the
>> > windows?
>> >
>> > Second, I don't think that we can safely change the behavior of all the
>> > existing
>> > rates used because it actually changes the computation of the rate as
>> > values
>> > allocated to past windows would expire before they would today.
>> >
>> > Overall, while trying to fit in the current rate, we are going to build
>> a
>> > slightly
>> > different version of the rate which will be even more

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot

Hi Rajini,

Thanks for your feedback. Please find my answers below:

1) Our main goal is to protect the controller from the extreme users
(DDoS). We want
to protect it from large requests or repetitive requests coming from a
single user.
That user could be used by multiple apps as you pointed out which makes it
even
worst. For instance, a buggy application could continuously create and
delete
topics in a tight loop.

The idea of the burst is to still allow creating or deleting topics in
batch because
this is how applications tend to do it. However, we want to keep the batch
size
under control with the burst. The burst does not allow requests of any
size. Topics
are accepted until the burst is passed. All the others are rejected.
Ideally, regular
and well behaving applications should never or rarely be throttled.

2) No, there is no explicit bound on the maximum throttle time. Having a
maximum
is straightforward here because the throttling time depends on the actual
size of
the request.

3) That's a very good question that I haven't thought of. I was thinking
about doing
it for the new quota only. I think that your suggestion of having a per
method argument
makes sense. I will update the KIP.

4) Indeed, it is outdated text. Let me update that.

Regards,
David

On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
wrote:

> Hi David,
>
> Thanks for the KIP. A few questions below:
>
> 1) The KIP says: *`Typically, applications tend to send one request to
> create all the topics that they need`*. What would the point of throttling
> be in this case? If there was a user quota for the principal used by that
> application, wouldn't we just allow the request due to the burst value? Is
> the KIP specifically for the case where multiple apps with the same user
> principal are started all at once?
>
> 2) Will there be a bound on the maximum throttle time?
>
> 3) Will *retry.quota.violation.exception* have any impact on request quota
> throttling or is it limited to the three requests where the new quota is
> applied? If it is only for the new quota, perhaps it would be better to add
> an option to the relevant requests rather than use an admin client config.
>
> 4) Is this outdated text, wasn't sure what this refers to : *While the
> configuration could be set by broker, we envision to have it set only has a
> cluster wide default for two reasons.*
>
> Regards,
>
> Rajini
>
>
> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
>
> > Hi Jun,
> >
> > Thanks for your reply.
> >
> > 10. I think that both options are likely equivalent from an accuracy
> point
> > of
> > view. If we put the implementation aside, conceptually, I am not
> convinced
> > by the used based approach because resources don't have a clear owner
> > in AK at the moment. A topic can be created by (Principal A, no client
> id),
> > then can be extended by (no principal, Client B), and finally deleted by
> > (Principal C, Client C). This does not sound right to me and I fear that
> it
> > is not going to be easy to grasp for our users.
> >
> > Regarding the naming, I do agree that we can make it more future proof.
> > I propose `controller_mutations_rate`. I think that differentiating the
> > mutations
> > from the reads is still a good thing for the future.
> >
> > 11. I am not convinced by your proposal for the following reasons:
> >
> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> could
> > effectively allocate 5 * 100 requests to the previous windows assuming
> that
> > they are empty. What shall we do with the remaining 60 requests? Shall we
> > allocate them to the current window or shall we divide it among all the
> > windows?
> >
> > Second, I don't think that we can safely change the behavior of all the
> > existing
> > rates used because it actually changes the computation of the rate as
> > values
> > allocated to past windows would expire before they would today.
> >
> > Overall, while trying to fit in the current rate, we are going to build a
> > slightly
> > different version of the rate which will be even more confusing for
> users.
> >
> > Instead, I think that we should embrace the notion of burst as it could
> > also
> > be applied to other quotas in the future. Users don't have to know that
> we
> > use the Token Bucket or a special rate inside at the end of the day. It
> is
> > an
> > implementation detail.
> >
> > Users would be able to define:
> > - a rate R; and
> > - a maximum burst B.
> >
> > If we change the metrics to be as follow:
> > - the actual rate;
> > - the burst balance in %, 0 meaning that the user is throttled;
> > It remains disattach from the algorithm.
> >
> > I personally prefer this over having to define a rate and a number of
> > windows
> > while having to understand that the number of windows implicitly defines
> > the
> > allowed burst. I think that it is clearer and easier to grasp. Don't you?
> >
> > Best,
> > David
> >
> > On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:
> >
> > >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-02 Thread Rajini Sivaram

Hi David,

Thanks for the KIP. A few questions below:

1) The KIP says: *`Typically, applications tend to send one request to
create all the topics that they need`*. What would the point of throttling
be in this case? If there was a user quota for the principal used by that
application, wouldn't we just allow the request due to the burst value? Is
the KIP specifically for the case where multiple apps with the same user
principal are started all at once?

2) Will there be a bound on the maximum throttle time?

3) Will *retry.quota.violation.exception* have any impact on request quota
throttling or is it limited to the three requests where the new quota is
applied? If it is only for the new quota, perhaps it would be better to add
an option to the relevant requests rather than use an admin client config.

4) Is this outdated text, wasn't sure what this refers to : *While the
configuration could be set by broker, we envision to have it set only has a
cluster wide default for two reasons.*

Regards,

Rajini


On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:

> Hi Jun,
>
> Thanks for your reply.
>
> 10. I think that both options are likely equivalent from an accuracy point
> of
> view. If we put the implementation aside, conceptually, I am not convinced
> by the used based approach because resources don't have a clear owner
> in AK at the moment. A topic can be created by (Principal A, no client id),
> then can be extended by (no principal, Client B), and finally deleted by
> (Principal C, Client C). This does not sound right to me and I fear that it
> is not going to be easy to grasp for our users.
>
> Regarding the naming, I do agree that we can make it more future proof.
> I propose `controller_mutations_rate`. I think that differentiating the
> mutations
> from the reads is still a good thing for the future.
>
> 11. I am not convinced by your proposal for the following reasons:
>
> First, in my toy example, I used 101 windows and 7 * 80 requests. We could
> effectively allocate 5 * 100 requests to the previous windows assuming that
> they are empty. What shall we do with the remaining 60 requests? Shall we
> allocate them to the current window or shall we divide it among all the
> windows?
>
> Second, I don't think that we can safely change the behavior of all the
> existing
> rates used because it actually changes the computation of the rate as
> values
> allocated to past windows would expire before they would today.
>
> Overall, while trying to fit in the current rate, we are going to build a
> slightly
> different version of the rate which will be even more confusing for users.
>
> Instead, I think that we should embrace the notion of burst as it could
> also
> be applied to other quotas in the future. Users don't have to know that we
> use the Token Bucket or a special rate inside at the end of the day. It is
> an
> implementation detail.
>
> Users would be able to define:
> - a rate R; and
> - a maximum burst B.
>
> If we change the metrics to be as follow:
> - the actual rate;
> - the burst balance in %, 0 meaning that the user is throttled;
> It remains disattach from the algorithm.
>
> I personally prefer this over having to define a rate and a number of
> windows
> while having to understand that the number of windows implicitly defines
> the
> allowed burst. I think that it is clearer and easier to grasp. Don't you?
>
> Best,
> David
>
> On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:
>
> > Hi, David, Anna,
> >
> > Thanks for the response. Sorry for the late reply.
> >
> > 10. Regarding exposing rate or usage as quota. Your argument is that
> usage
> > is not very accurate anyway and is harder to implement. So, let's just
> be a
> > bit loose and expose rate. I am sort of neutral on that. (1) It seems to
> me
> > that overall usage will be relatively more accurate than rate. All the
> > issues that Anna brought up seem to exist for rate too. Rate has the
> > additional problem that the cost of each request may not be uniform. (2)
> In
> > terms of implementation, a usage based approach requires tracking the
> user
> > info through the life cycle of an operation. However, as you mentioned,
> > things like topic creation can generate additional
> > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
> > associate those cost to the user who initiated the operation. If we do
> > that, we sort of need to track the user for the full life cycle of the
> > processing of an operation anyway. (3) If you prefer rate strongly, I
> don't
> > have strong objections. However, I do feel that the new quota name should
> > be able to cover all controller related cost longer term. This KIP
> > currently only covers topic creation/deletion. It would not be ideal if
> in
> > the future, we have to add yet another type of quota for some other
> > controller related operations. The quota name in the KIP has partition
> > mutation. In the future, if we allow things like topic renaming, it may
> not

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-02 Thread David Jacot

Hi Jun,

Thanks for your reply.

10. I think that both options are likely equivalent from an accuracy point
of
view. If we put the implementation aside, conceptually, I am not convinced
by the used based approach because resources don't have a clear owner
in AK at the moment. A topic can be created by (Principal A, no client id),
then can be extended by (no principal, Client B), and finally deleted by
(Principal C, Client C). This does not sound right to me and I fear that it
is not going to be easy to grasp for our users.

Regarding the naming, I do agree that we can make it more future proof.
I propose `controller_mutations_rate`. I think that differentiating the
mutations
from the reads is still a good thing for the future.

11. I am not convinced by your proposal for the following reasons:

First, in my toy example, I used 101 windows and 7 * 80 requests. We could
effectively allocate 5 * 100 requests to the previous windows assuming that
they are empty. What shall we do with the remaining 60 requests? Shall we
allocate them to the current window or shall we divide it among all the
windows?

Second, I don't think that we can safely change the behavior of all the
existing
rates used because it actually changes the computation of the rate as values
allocated to past windows would expire before they would today.

Overall, while trying to fit in the current rate, we are going to build a
slightly
different version of the rate which will be even more confusing for users.

Instead, I think that we should embrace the notion of burst as it could also
be applied to other quotas in the future. Users don't have to know that we
use the Token Bucket or a special rate inside at the end of the day. It is
an
implementation detail.

Users would be able to define:
- a rate R; and
- a maximum burst B.

If we change the metrics to be as follow:
- the actual rate;
- the burst balance in %, 0 meaning that the user is throttled;
It remains disattach from the algorithm.

I personally prefer this over having to define a rate and a number of
windows
while having to understand that the number of windows implicitly defines the
allowed burst. I think that it is clearer and easier to grasp. Don't you?

Best,
David

On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:

> Hi, David, Anna,
>
> Thanks for the response. Sorry for the late reply.
>
> 10. Regarding exposing rate or usage as quota. Your argument is that usage
> is not very accurate anyway and is harder to implement. So, let's just be a
> bit loose and expose rate. I am sort of neutral on that. (1) It seems to me
> that overall usage will be relatively more accurate than rate. All the
> issues that Anna brought up seem to exist for rate too. Rate has the
> additional problem that the cost of each request may not be uniform. (2) In
> terms of implementation, a usage based approach requires tracking the user
> info through the life cycle of an operation. However, as you mentioned,
> things like topic creation can generate additional
> LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
> associate those cost to the user who initiated the operation. If we do
> that, we sort of need to track the user for the full life cycle of the
> processing of an operation anyway. (3) If you prefer rate strongly, I don't
> have strong objections. However, I do feel that the new quota name should
> be able to cover all controller related cost longer term. This KIP
> currently only covers topic creation/deletion. It would not be ideal if in
> the future, we have to add yet another type of quota for some other
> controller related operations. The quota name in the KIP has partition
> mutation. In the future, if we allow things like topic renaming, it may not
> be related to partition mutation directly and it would be trickier to fit
> those operations in the current quota. So, maybe sth more general like
> controller_operations_quota will be more future proof.
>
> 11. Regarding the difference between the token bucket algorithm and our
> current quota mechanism. That's a good observation. It seems that we can
> make a slight change to our current quota mechanism to achieve a similar
> thing. As you said, the issue was that we allocate all 7 * 80 requests in
> the last 1 sec window in our current mechanism. This is a bit unintuitive
> since each sec only has a quota capacity of 5. An alternative way is to
> allocate the 7 * 80 requests to all previous windows, each up to the
> remaining quota capacity left. So, you will fill the current 1 sec window
> and all previous ones, each with 5. Then, it seems this will give the same
> behavior as token bucket? The reason that I keep asking this is that from
> an operational perspective, it's simpler if all types of quotas work in the
> same way.
>
> Jun
>
> On Tue, May 26, 2020 at 9:52 AM David Jacot  wrote:
>
> > Hi folks,
> >
> > I have updated the KIP. As mentioned by Jun, I have made the
> > quota per principal/clientid similarly to the

Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-30 Thread Gwen Shapira

+1 (binding)

Looks great. Thank you for the in-depth design and discussion.

On Fri, May 29, 2020 at 7:58 AM David Jacot  wrote:

> Hi folks,
>
> I'd like to start the vote for KIP-599 which proposes a new quota to
> throttle create topic, create partition, and delete topics operations to
> protect the Kafka controller:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
>
> Please, let me know what you think.
>
> Cheers,
> David
>


-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-29 Thread Jun Rao

Hi, David, Anna,

Thanks for the response. Sorry for the late reply.

10. Regarding exposing rate or usage as quota. Your argument is that usage
is not very accurate anyway and is harder to implement. So, let's just be a
bit loose and expose rate. I am sort of neutral on that. (1) It seems to me
that overall usage will be relatively more accurate than rate. All the
issues that Anna brought up seem to exist for rate too. Rate has the
additional problem that the cost of each request may not be uniform. (2) In
terms of implementation, a usage based approach requires tracking the user
info through the life cycle of an operation. However, as you mentioned,
things like topic creation can generate additional
LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
associate those cost to the user who initiated the operation. If we do
that, we sort of need to track the user for the full life cycle of the
processing of an operation anyway. (3) If you prefer rate strongly, I don't
have strong objections. However, I do feel that the new quota name should
be able to cover all controller related cost longer term. This KIP
currently only covers topic creation/deletion. It would not be ideal if in
the future, we have to add yet another type of quota for some other
controller related operations. The quota name in the KIP has partition
mutation. In the future, if we allow things like topic renaming, it may not
be related to partition mutation directly and it would be trickier to fit
those operations in the current quota. So, maybe sth more general like
controller_operations_quota will be more future proof.

11. Regarding the difference between the token bucket algorithm and our
current quota mechanism. That's a good observation. It seems that we can
make a slight change to our current quota mechanism to achieve a similar
thing. As you said, the issue was that we allocate all 7 * 80 requests in
the last 1 sec window in our current mechanism. This is a bit unintuitive
since each sec only has a quota capacity of 5. An alternative way is to
allocate the 7 * 80 requests to all previous windows, each up to the
remaining quota capacity left. So, you will fill the current 1 sec window
and all previous ones, each with 5. Then, it seems this will give the same
behavior as token bucket? The reason that I keep asking this is that from
an operational perspective, it's simpler if all types of quotas work in the
same way.

Jun

On Tue, May 26, 2020 at 9:52 AM David Jacot  wrote:

> Hi folks,
>
> I have updated the KIP. As mentioned by Jun, I have made the
> quota per principal/clientid similarly to the other quotas. I have
> also explained how this will work in conjunction with the auto
> topics creation.
>
> Please, take a look at it. I plan to call a vote for it in the next few
> days if there are no comments in this thread.
>
> Best,
> David
>
> On Wed, May 13, 2020 at 10:57 AM Tom Bentley  wrote:
>
> > Hi David,
> >
> > Thanks for the explanation and confirmation that evolving the APIs is not
> > off the table in the longer term.
> >
> > Kind regards,
> >
> > Tom
> >
>

[VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-29 Thread David Jacot

Hi folks,

I'd like to start the vote for KIP-599 which proposes a new quota to
throttle create topic, create partition, and delete topics operations to
protect the Kafka controller:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations

Please, let me know what you think.

Cheers,
David

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-26 Thread David Jacot

Hi folks,

I have updated the KIP. As mentioned by Jun, I have made the
quota per principal/clientid similarly to the other quotas. I have
also explained how this will work in conjunction with the auto
topics creation.

Please, take a look at it. I plan to call a vote for it in the next few
days if there are no comments in this thread.

Best,
David

On Wed, May 13, 2020 at 10:57 AM Tom Bentley  wrote:

> Hi David,
>
> Thanks for the explanation and confirmation that evolving the APIs is not
> off the table in the longer term.
>
> Kind regards,
>
> Tom
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-13 Thread Tom Bentley

Hi David,

Thanks for the explanation and confirmation that evolving the APIs is not
off the table in the longer term.

Kind regards,

Tom

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-13 Thread David Jacot

Hi Tom,

>> What exactly is the problem with having a huge backlog of pending
>> operations? I can see that the backlog would need persisting so that the
>> controller could change without losing track of the topics to be mutated,
>> and the mutations would need to be submitted in batches to the controller
>> event queue (thus allowing other controller requests to be
interleaved).  I
>> realise this is not feasible right now, I'm just trying to understand if
>> it's feasible at all and if there's any appetite for making the requisite
>> API changes in the future in order to prevent these problems even for
large
>> single requests.

It is definitely feasible. My concern with the approach is about the way our
current API works. Let me try to illustrate it with an example.

When the admin client sends a CreateTopicsRequest to the controller, the
request goes to the purgatory waiting until all the topics are created or
the
timeout specified in the request is reached. If the timeout is reached, a
RequestTimeoutException is returned to the client. This is used to fail the
future of the caller. In conjunction, the admin client fails any pending
request
with a TimeoutException after the request timeout is reached (30s by
default).
In the former case, the caller will likely retry. In the later case, the
admin client
will automatically retry. In both cases, the broker will respond with a
TopicExistsException.

Having a huge backlog of pending operations will amplify this weird
behavior.
Clients will tend to get TopicExistsException errors when they create
topics for
the first time which is really weird.

I think that our current API is not well suited for this. An asynchronous
API workflow
with one API to create/delete and another one to query the status of
completion would
be better suited. We can definitely involve our API towards this but we
need to
figure out a compatibility story for existing clients.

Another aspect is the fairness among the clients. Imagine a case where one
client
continuously creates and deletes topics in a tight loop. This would
flood the queue
and delay the creations and the deletions of the other clients. Throttling
at admission
time mitigates this directly. Throttling at execution would need to take
this into account
to ensure fairness among the clients. It is a little harder to do this in
the controller as
the controller is completely agnostic from the principals and the client
ids.

These reasons made me lean towards the current proposal. Does that make
sense?

Best,
David

On Wed, May 13, 2020 at 10:05 AM David Jacot  wrote:

> Hi Jun,
>
> Coming back to your question regarding the differences between the token
> bucket algorithm and our current quota mechanism. I did some tests and
> they confirmed my first intuition that our current mechanism does not work
> well with a bursty workload. Let me try to illustrate the difference with
> an
> example. One important aspect to keep in mind is that we don't want to
> reject requests when the quota is exhausted.
>
> Let's say that we want to guarantee an average rate R=5 partitions/sec
> while
> allowing a burst B=500 partitions.
>
> With our current mechanism, this translates to following parameters:
> - Quota = 5
> - Samples = B / R + 1 = 101 (to allow the burst)
> - Time Window = 1s (the default)
>
> Now, let's say that a client wants to create 7 topics with 80 partitions
> each at
> the time T. It brings the rate to 5.6 (7 * 80 / 100) which is above the
> quota so
> any new request is rejected until the quota gets back above R. In theory,
> the
> client must wait 12 secs ((5.6 - 5) / 5 * 100) to get it back to R. In
> practice, due
> to the sparse samples (one sample worth 560), the rate won't decrease until
> that sample is dropped and it will be after 101 secs. It gets worse if the
> burst
> is increased.
>
> With the token bucket algorithm, this translate to the following
> parameters:
> - Rate = 5
> - Tokens = 500
>
> The same request decreases the number of available tokens to -60 which is
> below 0 so any new request is rejected until the number of available tokens
> gets back above 0. This takes 12 secs ((60+1) / 5).
>
> The token bucket algorithm is more suited for bursty workloads which is our
> case here. I hope that this example helps to clarify the choice.
>
> Best,
> David
>
> On Tue, May 12, 2020 at 3:19 PM Tom Bentley  wrote:
>
>> Hi David,
>>
>> Thanks for the reply.
>>
>> >> If I understand the proposed throttling algorithm, an initial request
>> > would
>> > >> be allowed (possibly making K negative) and only subsequent requests
>> > >> (before K became positive) would receive the QUOTA_VIOLATED. That
>> would
>> > >> mean it was still possible to block the controller from handling
>> other
>> > >> events – you just need to do so via making one big request.
>> >
>> > That is correct. One could still create one big topic (not request) and
>> > that would
>> > create some load on the controller. All options suffer from

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-13 Thread David Jacot

Hi Jun,

Coming back to your question regarding the differences between the token
bucket algorithm and our current quota mechanism. I did some tests and
they confirmed my first intuition that our current mechanism does not work
well with a bursty workload. Let me try to illustrate the difference with an
example. One important aspect to keep in mind is that we don't want to
reject requests when the quota is exhausted.

Let's say that we want to guarantee an average rate R=5 partitions/sec while
allowing a burst B=500 partitions.

With our current mechanism, this translates to following parameters:
- Quota = 5
- Samples = B / R + 1 = 101 (to allow the burst)
- Time Window = 1s (the default)

Now, let's say that a client wants to create 7 topics with 80 partitions
each at
the time T. It brings the rate to 5.6 (7 * 80 / 100) which is above the
quota so
any new request is rejected until the quota gets back above R. In theory,
the
client must wait 12 secs ((5.6 - 5) / 5 * 100) to get it back to R. In
practice, due
to the sparse samples (one sample worth 560), the rate won't decrease until
that sample is dropped and it will be after 101 secs. It gets worse if the
burst
is increased.

With the token bucket algorithm, this translate to the following parameters:
- Rate = 5
- Tokens = 500

The same request decreases the number of available tokens to -60 which is
below 0 so any new request is rejected until the number of available tokens
gets back above 0. This takes 12 secs ((60+1) / 5).

The token bucket algorithm is more suited for bursty workloads which is our
case here. I hope that this example helps to clarify the choice.

Best,
David

On Tue, May 12, 2020 at 3:19 PM Tom Bentley  wrote:

> Hi David,
>
> Thanks for the reply.
>
> >> If I understand the proposed throttling algorithm, an initial request
> > would
> > >> be allowed (possibly making K negative) and only subsequent requests
> > >> (before K became positive) would receive the QUOTA_VIOLATED. That
> would
> > >> mean it was still possible to block the controller from handling other
> > >> events – you just need to do so via making one big request.
> >
> > That is correct. One could still create one big topic (not request) and
> > that would
> > create some load on the controller. All options suffer from this issue as
> > we can
> > stop clients from creating a very large topic. At least, when it happens,
> > the client
> > will have to wait to pay back its credits which guarantee that we control
> > the average
> > load on the controller.
> >
>
> I can see that the admission throttling is better than nothing. It's just
> that it doesn't fully solve the problem described in the KIP's motivation.
> That doesn't mean it's not worth doing. I certainly prefer this approach
> over that taken by KIP-578 (which cites the effects of deleting a single
> topic with many partitions).
>
>
> > >> While the reasons for rejecting execution throttling make sense given
> > the
> > >> RPCs we have today that seems to be at the cost of still allowing harm
> > to
> > >> the cluster, or did I misunderstand?
> >
> > Execution throttling would also suffer from large topics being created.
> We
> > have
> > rejected it due to the current RPCs and also because it does not limit
> the
> > amount
> > of work queued up in the controller. Imagine a low quota, that would
> result
> > in a huge
> > backlog of pending operations.
> >
>
> What exactly is the problem with having a huge backlog of pending
> operations? I can see that the backlog would need persisting so that the
> controller could change without losing track of the topics to be mutated,
> and the mutations would need to be submitted in batches to the controller
> event queue (thus allowing other controller requests to be interleaved).  I
> realise this is not feasible right now, I'm just trying to understand if
> it's feasible at all and if there's any appetite for making the requisite
> API changes in the future in order to prevent these problems even for large
> single requests.
>
> Kind regards,
>
> Tom
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-12 Thread Tom Bentley

Hi David,

Thanks for the reply.

>> If I understand the proposed throttling algorithm, an initial request
> would
> >> be allowed (possibly making K negative) and only subsequent requests
> >> (before K became positive) would receive the QUOTA_VIOLATED. That would
> >> mean it was still possible to block the controller from handling other
> >> events – you just need to do so via making one big request.
>
> That is correct. One could still create one big topic (not request) and
> that would
> create some load on the controller. All options suffer from this issue as
> we can
> stop clients from creating a very large topic. At least, when it happens,
> the client
> will have to wait to pay back its credits which guarantee that we control
> the average
> load on the controller.
>

I can see that the admission throttling is better than nothing. It's just
that it doesn't fully solve the problem described in the KIP's motivation.
That doesn't mean it's not worth doing. I certainly prefer this approach
over that taken by KIP-578 (which cites the effects of deleting a single
topic with many partitions).


> >> While the reasons for rejecting execution throttling make sense given
> the
> >> RPCs we have today that seems to be at the cost of still allowing harm
> to
> >> the cluster, or did I misunderstand?
>
> Execution throttling would also suffer from large topics being created. We
> have
> rejected it due to the current RPCs and also because it does not limit the
> amount
> of work queued up in the controller. Imagine a low quota, that would result
> in a huge
> backlog of pending operations.
>

What exactly is the problem with having a huge backlog of pending
operations? I can see that the backlog would need persisting so that the
controller could change without losing track of the topics to be mutated,
and the mutations would need to be submitted in batches to the controller
event queue (thus allowing other controller requests to be interleaved).  I
realise this is not feasible right now, I'm just trying to understand if
it's feasible at all and if there's any appetite for making the requisite
API changes in the future in order to prevent these problems even for large
single requests.

Kind regards,

Tom

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-11 Thread David Jacot

Hi Tom,

Thanks for the feedback.

>> If I understand the proposed throttling algorithm, an initial request
would
>> be allowed (possibly making K negative) and only subsequent requests
>> (before K became positive) would receive the QUOTA_VIOLATED. That would
>> mean it was still possible to block the controller from handling other
>> events – you just need to do so via making one big request.

That is correct. One could still create one big topic (not request) and
that would
create some load on the controller. All options suffer from this issue as
we can
stop clients from creating a very large topic. At least, when it happens,
the client
will have to wait to pay back its credits which guarantee that we control
the average
load on the controller.

>> While the reasons for rejecting execution throttling make sense given the
>> RPCs we have today that seems to be at the cost of still allowing harm to
>> the cluster, or did I misunderstand?

Execution throttling would also suffer from large topics being created. We
have
rejected it due to the current RPCs and also because it does not limit the
amount
of work queued up in the controller. Imagine a low quota, that would result
in a huge
backlog of pending operations.

Best,
David

On Mon, May 11, 2020 at 4:57 PM David Jacot  wrote:

> Hi Anna and Jun,
>
> Anna, thanks for your thoughtful feedback. Overall, I agree with what you
> said. If I summarize, you said that using time on server threads is not
> easier
> to tune than a rate based approach and it does not really capture all the
> load neither as the control requests are not taken into account.
>
> I have been thinking about using an approach similar to the request quota
> and
> here are my thoughts:
>
> 1. With the current architecture of the controller, the ZK writes
> resulting from a
> topic being created, expanded or deleted are spread among the api handler
> threads and the controller thread. This is problematic to measure the real
> thread
> usage. I suppose that this will change with KIP-500 though.
>
> 2. I do agree with Anna that we don't only care about requests tying up
> the controller
> thread but also care about the control requests. A rate based approach
> would allow
> us to define a value which copes with both dimensions.
>
> 3. Doing the accounting in the controller requires to attach a principal
> and a client id
> to each topic as the quotas are expressed per principal and/or client id.
> I find this a
> little odd. These information would have to be updated whenever a topic is
> expanded
> or deleted and any subsequent operations for that topic in the controller
> would be
> accounted for the last principal and client id. As an example, imagine a
> client that
> deletes topics which have partitions on a dead broker. In this case, the
> deletion would
> be retried until it succeeds. If that client has a low quota, that may
> prevent it from
> doing any new operations until the delete succeeds. This is a strange
> behavior.
>
> 4. One important aspect of the proposal is that we want to be able to
> reject requests
> when the quota is exhausted. With a usage based approach, it is difficult
> to compute
> the time to wait before being able to create the topics. The only that we
> could do is
> to ask the client to wait until the measure time drops to the quota to
> ensure that a new
> operations can be accepted. With a rate based approach, we can
> precisely compute
> the time to wait.
>
> 5. I find the experience for users slightly better with a rate based
> approach. When I say
> users here, I mean the users of the admin client or the cli. When
> throttled, they would
> get an error saying: "can't create because you have reached X partition
> mutations/sec".
> With the other approach, we could only say: "quota exceeded".
>
> 6. I do agree that the rate based approach is less generic in the long
> term, especially if
> other resource types are added in the controller.
>
> Altogether, I am not convinced by a usage based approach and I would
> rather lean
> towards keeping the current proposal.
>
> Best,
> David
>
> On Thu, May 7, 2020 at 2:11 AM Anna Povzner  wrote:
>
>> Hi David and Jun,
>>
>> I wanted to add to the discussion about using requests/sec vs. time on
>> server threads (similar to request quota) for expressing quota for topic
>> ops.
>>
>> I think request quota does not protect the brokers from overload by itself
>> -- it still requires tuning and sometimes re-tuning, because it depends on
>> the workload behavior of all users (like a relative share of requests
>> exempt from throttling). This makes it not that easy to set. Let me give
>> you more details:
>>
>>1.
>>
>>The amount of work that the user can get from the request quota depends
>>on the load from other users. We measure and enforce user's clock time
>> on
>>threads---the time between 2 timestamps, one when the operation starts
>> and
>>one when the operation ends. If the user is the only load

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-11 Thread David Jacot

Hi Anna and Jun,

Anna, thanks for your thoughtful feedback. Overall, I agree with what you
said. If I summarize, you said that using time on server threads is not
easier
to tune than a rate based approach and it does not really capture all the
load neither as the control requests are not taken into account.

I have been thinking about using an approach similar to the request quota
and
here are my thoughts:

1. With the current architecture of the controller, the ZK writes resulting
from a
topic being created, expanded or deleted are spread among the api handler
threads and the controller thread. This is problematic to measure the real
thread
usage. I suppose that this will change with KIP-500 though.

2. I do agree with Anna that we don't only care about requests tying up the
controller
thread but also care about the control requests. A rate based approach
would allow
us to define a value which copes with both dimensions.

3. Doing the accounting in the controller requires to attach a principal
and a client id
to each topic as the quotas are expressed per principal and/or client id. I
find this a
little odd. These information would have to be updated whenever a topic is
expanded
or deleted and any subsequent operations for that topic in the controller
would be
accounted for the last principal and client id. As an example, imagine a
client that
deletes topics which have partitions on a dead broker. In this case, the
deletion would
be retried until it succeeds. If that client has a low quota, that may
prevent it from
doing any new operations until the delete succeeds. This is a strange
behavior.

4. One important aspect of the proposal is that we want to be able to
reject requests
when the quota is exhausted. With a usage based approach, it is difficult
to compute
the time to wait before being able to create the topics. The only that we
could do is
to ask the client to wait until the measure time drops to the quota to
ensure that a new
operations can be accepted. With a rate based approach, we can
precisely compute
the time to wait.

5. I find the experience for users slightly better with a rate based
approach. When I say
users here, I mean the users of the admin client or the cli. When
throttled, they would
get an error saying: "can't create because you have reached X partition
mutations/sec".
With the other approach, we could only say: "quota exceeded".

6. I do agree that the rate based approach is less generic in the long
term, especially if
other resource types are added in the controller.

Altogether, I am not convinced by a usage based approach and I would rather
lean
towards keeping the current proposal.

Best,
David

On Thu, May 7, 2020 at 2:11 AM Anna Povzner  wrote:

> Hi David and Jun,
>
> I wanted to add to the discussion about using requests/sec vs. time on
> server threads (similar to request quota) for expressing quota for topic
> ops.
>
> I think request quota does not protect the brokers from overload by itself
> -- it still requires tuning and sometimes re-tuning, because it depends on
> the workload behavior of all users (like a relative share of requests
> exempt from throttling). This makes it not that easy to set. Let me give
> you more details:
>
>1.
>
>The amount of work that the user can get from the request quota depends
>on the load from other users. We measure and enforce user's clock time
> on
>threads---the time between 2 timestamps, one when the operation starts
> and
>one when the operation ends. If the user is the only load on the
> broker, it
>is less likely that their operation will be interrupted by the kernel to
>switch to another thread, and time away from the thread still counts.
>1.
>
>   Pros: this makes it more work-conserving, the user is less limited
>   when there are more resources available.
>   2.
>
>   Cons: Harder to capacity plan for the user, and could be confusing
>   when the broker will suddenly stop supporting the load which it was
>   supporting before.
>   2.
>
>For the above reason, it makes most sense to maximize the user's quota
>and set it as a percent of the maximum thread capacity (1100 with
> default
>broker config).
>3.
>
>However, the actual maximum threads capacity is not really 1100:
>1.
>
>   Some of it will be taken by requests exempt from throttling, and the
>   amount depends on the workload. We have seen cases (somewhat rare)
> where
>   requests exempt from throttling take like ⅔ of the time on threads.
>   2.
>
>   We have also seen cases of an overloaded cluster (full queues,
>   timeouts, etc) due to high request rate while the time used on
> threads was
>   way below the max (1100), like 600 or 700 (total exempt + non-exempt
>   usage). Basically, when a broker is close to 100% CPU, it takes more
> and
>   more time for the "unaccounted" work like thread getting a chance to
> pick
>   up a request from the

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-05-06 Thread Anna Povzner

Hi David and Jun,

I wanted to add to the discussion about using requests/sec vs. time on
server threads (similar to request quota) for expressing quota for topic
ops.

I think request quota does not protect the brokers from overload by itself
-- it still requires tuning and sometimes re-tuning, because it depends on
the workload behavior of all users (like a relative share of requests
exempt from throttling). This makes it not that easy to set. Let me give
you more details:

   1.

   The amount of work that the user can get from the request quota depends
   on the load from other users. We measure and enforce user's clock time on
   threads---the time between 2 timestamps, one when the operation starts and
   one when the operation ends. If the user is the only load on the broker, it
   is less likely that their operation will be interrupted by the kernel to
   switch to another thread, and time away from the thread still counts.
   1.

  Pros: this makes it more work-conserving, the user is less limited
  when there are more resources available.
  2.

  Cons: Harder to capacity plan for the user, and could be confusing
  when the broker will suddenly stop supporting the load which it was
  supporting before.
  2.

   For the above reason, it makes most sense to maximize the user's quota
   and set it as a percent of the maximum thread capacity (1100 with default
   broker config).
   3.

   However, the actual maximum threads capacity is not really 1100:
   1.

  Some of it will be taken by requests exempt from throttling, and the
  amount depends on the workload. We have seen cases (somewhat rare) where
  requests exempt from throttling take like ⅔ of the time on threads.
  2.

  We have also seen cases of an overloaded cluster (full queues,
  timeouts, etc) due to high request rate while the time used on
threads was
  way below the max (1100), like 600 or 700 (total exempt + non-exempt
  usage). Basically, when a broker is close to 100% CPU, it takes more and
  more time for the "unaccounted" work like thread getting a chance to pick
  up a request from the queue and get a timestamp.
  4.

   As a result, there will be some tuning to decide on a safe value for
   total thread capacity, from where users can carve out their quotas. Some
   changes in users' workloads may require re-tuning, if, for example, it
   dramatically changes the relative share of non-exempt load.

I think request quota works well for client request load in a sense that it
ensures that different users get a fair/proportional share of resources
during high broker load. If the user cannot get enough resources from their
quota to support their request rate anymore, they can monitor their load
and expand the cluster if needed (or rebalance).

However, I think using time on threads for topic ops could be even more
difficult than simple request rate (as proposed):

   1.

   I understand that we don't only care about topic requests tying up the
   controller thread, but we also care that it does not create a large extra
   load on the cluster due to LeaderAndIsr and other related requests (this is
   more important for small clusters).
   2.

   For that reason, tuning quota in terms of time on threads can be harder,
   because there is no easy way to say how this quota would translate to a
   number of operations (because that would depend on other broker load).

Since tuning would be required anyway, I see the following workflow if we
express controller quota in terms of partition mutations per second:

   1.

   Run topic workload in isolation (the most expensive one, like create
   topic vs. add partitions) and see how much load it adds based on incoming
   rate. Choose quota depending on how much extra load your cluster can
   sustain in addition to its normal load.
   2.

   Could be useful to publish some experimental results to give some
   ballpark numbers to make this sizing easier.

I am interested to see if you agree with the listed assumptions here. I may
have missed something, especially if there is an easier workflow for
setting quota based on time on threads.

Thanks,

Anna

On Thu, Apr 30, 2020 at 8:13 AM Tom Bentley  wrote:

> Hi David,
>
> Thanks for the KIP.
>
> If I understand the proposed throttling algorithm, an initial request would
> be allowed (possibly making K negative) and only subsequent requests
> (before K became positive) would receive the QUOTA_VIOLATED. That would
> mean it was still possible to block the controller from handling other
> events – you just need to do so via making one big request.
>
> While the reasons for rejecting execution throttling make sense given the
> RPCs we have today that seems to be at the cost of still allowing harm to
> the cluster, or did I misunderstand?
>
> Kind regards,
>
> Tom
>
>
>
> On Tue, Apr 28, 2020 at 1:49 AM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the reply. A few more

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-04-30 Thread Tom Bentley

Hi David,

Thanks for the KIP.

If I understand the proposed throttling algorithm, an initial request would
be allowed (possibly making K negative) and only subsequent requests
(before K became positive) would receive the QUOTA_VIOLATED. That would
mean it was still possible to block the controller from handling other
events – you just need to do so via making one big request.

While the reasons for rejecting execution throttling make sense given the
RPCs we have today that seems to be at the cost of still allowing harm to
the cluster, or did I misunderstand?

Kind regards,

Tom



On Tue, Apr 28, 2020 at 1:49 AM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply. A few more comments.
>
> 1. I am actually not sure if a quota based on request rate is easier for
> the users. For context, in KIP-124, we started with a request rate quota,
> but ended up not choosing it. The main issues are (a) requests are not
> equal; some are more expensive than others; (b) the users typically don't
> know how expensive each type of request is. For example, a big part of
> the controller cost is ZK write. To create a new topic with 1 partition,
> the number of ZK writes is 4 (1 for each segment
> in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of
> adding one partition to an existing topic requires 2 ZK writes. The cost of
> deleting a topic with 1 partition requires 6 to 7 ZK writes. It's unlikely
> for a user to know the exact cost associated with those
> implementation details. If users don't know the cost, it's not clear if
> they can set the rate properly.
>
> 2. I think that depends on the goal. To me, the common problem is that you
> have many applications running on a shared Kafka cluster and one of the
> applications abuses the broker by issuing too many requests. In this case,
> a global quota will end up throttling every application. However, what we
> really want in this case is to only throttle the application that causes
> the problem. A user level quota solves this problem more effectively. We
> may still need some sort of global quota when the total usage from all
> applications exceeds the broker resource. But that seems to be secondary
> since it's uncommon for all applications' usage to go up at the same time.
> We can also solve this problem by reducing the per user quota for every
> application if there is a user level quota.
>
> 3. Not sure that I fully understand the difference in burst balance. The
> current throttling logic works as follows. Each quota is measured over a
> number of time windows. Suppose the Quota is to X/sec. If time passes and
> the quota is not being used, we are accumulating credit at the rate of
> X/sec. If a quota is being used, we are reducing that credit based on the
> usage. The credit expires when the corresponding window rolls out. The max
> credit that can be accumulated is X * number of windows * window size. So,
> in some sense, the current logic also supports burst and a way to cap the
> burst. Could you explain the difference with Token Bucket a bit more? Also,
> the current quota system always executes the current request even if it's
> being throttled. It just informs the client to back off a throttled amount
> of time before sending another request.
>
> Jun
>
>
>
> On Mon, Apr 27, 2020 at 5:15 AM David Jacot  wrote:
>
> > Hi Jun,
> >
> > Thank you for the feedback.
> >
> > 1. You are right. At the end, we do care about the percentage of time
> that
> > an operation ties up the controller thread. I thought about this but I
> was
> > not entirely convinced by it for following reasons:
> >
> > 1.1. While I do agree that setting up a rate and a burst is a bit harder
> > than
> > allocating a percentage for the administrator of the cluster, I believe
> > that a
> > rate and a burst are way easier to understand for the users of the
> cluster.
> >
> > 1.2. Measuring the time that a request ties up the controller thread is
> not
> > as straightforward as it sounds because the controller reacts to ZK
> > TopicChange and TopicDeletion events in lieu of handling requests
> directly.
> > These events do not carry on the client id nor the user information so
> the
> > best would be to refactor the controller to accept requests instead of
> > reacting
> > to the events. This will be possible with KIP-590. It has obviously other
> > side effects in the controller (e.g. batching).
> >
> > I leaned towards the current proposal mainly due to 1.1. as 1.2. can be
> (or
> > will be) fixed. Does 1.1. sound like a reasonable trade off to you?
> >
> > 2. It is not in the current proposal. I thought that a global quota would
> > be
> > enough to start with. We can definitely make it work like the other
> quotas.
> >
> > 3. The main difference is that the Token Bucket algorithm defines an
> > explicit
> > burst B while guaranteeing an average rate R whereas our existing quota
> > guarantees an average rate R as well but starts to throttle as soon as
> the
> > rate

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-04-27 Thread Jun Rao

Hi, David,

Thanks for the reply. A few more comments.

1. I am actually not sure if a quota based on request rate is easier for
the users. For context, in KIP-124, we started with a request rate quota,
but ended up not choosing it. The main issues are (a) requests are not
equal; some are more expensive than others; (b) the users typically don't
know how expensive each type of request is. For example, a big part of
the controller cost is ZK write. To create a new topic with 1 partition,
the number of ZK writes is 4 (1 for each segment
in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of
adding one partition to an existing topic requires 2 ZK writes. The cost of
deleting a topic with 1 partition requires 6 to 7 ZK writes. It's unlikely
for a user to know the exact cost associated with those
implementation details. If users don't know the cost, it's not clear if
they can set the rate properly.

2. I think that depends on the goal. To me, the common problem is that you
have many applications running on a shared Kafka cluster and one of the
applications abuses the broker by issuing too many requests. In this case,
a global quota will end up throttling every application. However, what we
really want in this case is to only throttle the application that causes
the problem. A user level quota solves this problem more effectively. We
may still need some sort of global quota when the total usage from all
applications exceeds the broker resource. But that seems to be secondary
since it's uncommon for all applications' usage to go up at the same time.
We can also solve this problem by reducing the per user quota for every
application if there is a user level quota.

3. Not sure that I fully understand the difference in burst balance. The
current throttling logic works as follows. Each quota is measured over a
number of time windows. Suppose the Quota is to X/sec. If time passes and
the quota is not being used, we are accumulating credit at the rate of
X/sec. If a quota is being used, we are reducing that credit based on the
usage. The credit expires when the corresponding window rolls out. The max
credit that can be accumulated is X * number of windows * window size. So,
in some sense, the current logic also supports burst and a way to cap the
burst. Could you explain the difference with Token Bucket a bit more? Also,
the current quota system always executes the current request even if it's
being throttled. It just informs the client to back off a throttled amount
of time before sending another request.

Jun

On Mon, Apr 27, 2020 at 5:15 AM David Jacot  wrote:

> Hi Jun,
>
> Thank you for the feedback.
>
> 1. You are right. At the end, we do care about the percentage of time that
> an operation ties up the controller thread. I thought about this but I was
> not entirely convinced by it for following reasons:
>
> 1.1. While I do agree that setting up a rate and a burst is a bit harder
> than
> allocating a percentage for the administrator of the cluster, I believe
> that a
> rate and a burst are way easier to understand for the users of the cluster.
>
> 1.2. Measuring the time that a request ties up the controller thread is not
> as straightforward as it sounds because the controller reacts to ZK
> TopicChange and TopicDeletion events in lieu of handling requests directly.
> These events do not carry on the client id nor the user information so the
> best would be to refactor the controller to accept requests instead of
> reacting
> to the events. This will be possible with KIP-590. It has obviously other
> side effects in the controller (e.g. batching).
>
> I leaned towards the current proposal mainly due to 1.1. as 1.2. can be (or
> will be) fixed. Does 1.1. sound like a reasonable trade off to you?
>
> 2. It is not in the current proposal. I thought that a global quota would
> be
> enough to start with. We can definitely make it work like the other quotas.
>
> 3. The main difference is that the Token Bucket algorithm defines an
> explicit
> burst B while guaranteeing an average rate R whereas our existing quota
> guarantees an average rate R as well but starts to throttle as soon as the
> rate goes above the defined quota.
>
> Creating and deleting topics is bursty by nature. Applications create or
> delete
> topics occasionally by usually sending one request with multiple topics.
> The
> reasoning behind allowing a burst is to allow such requests with a
> reasonable
> size to pass without being throttled whereas our current quota mechanism
> would reject any topics as soon as the rate is above the quota requiring
> the
> applications to send subsequent requests to create or to delete all the
> topics.
>
> Best,
> David
>
>
> On Fri, Apr 24, 2020 at 9:03 PM Jun Rao  wrote:
>
> > Hi, David,
> >
> > Thanks for the KIP. A few quick comments.
> >
> > 1. About quota.partition.mutations.rate. I am not sure if it's very easy
> > for the user to set the quota as a rate. For example, each partition
> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-04-27 Thread David Jacot

Hi Jun,

Thank you for the feedback.

1. You are right. At the end, we do care about the percentage of time that
an operation ties up the controller thread. I thought about this but I was
not entirely convinced by it for following reasons:

1.1. While I do agree that setting up a rate and a burst is a bit harder
than
allocating a percentage for the administrator of the cluster, I believe
that a
rate and a burst are way easier to understand for the users of the cluster.

1.2. Measuring the time that a request ties up the controller thread is not
as straightforward as it sounds because the controller reacts to ZK
TopicChange and TopicDeletion events in lieu of handling requests directly.
These events do not carry on the client id nor the user information so the
best would be to refactor the controller to accept requests instead of
reacting
to the events. This will be possible with KIP-590. It has obviously other
side effects in the controller (e.g. batching).

I leaned towards the current proposal mainly due to 1.1. as 1.2. can be (or
will be) fixed. Does 1.1. sound like a reasonable trade off to you?

2. It is not in the current proposal. I thought that a global quota would be
enough to start with. We can definitely make it work like the other quotas.

3. The main difference is that the Token Bucket algorithm defines an
explicit
burst B while guaranteeing an average rate R whereas our existing quota
guarantees an average rate R as well but starts to throttle as soon as the
rate goes above the defined quota.

Creating and deleting topics is bursty by nature. Applications create or
delete
topics occasionally by usually sending one request with multiple topics. The
reasoning behind allowing a burst is to allow such requests with a
reasonable
size to pass without being throttled whereas our current quota mechanism
would reject any topics as soon as the rate is above the quota requiring the
applications to send subsequent requests to create or to delete all the
topics.

Best,
David

On Fri, Apr 24, 2020 at 9:03 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the KIP. A few quick comments.
>
> 1. About quota.partition.mutations.rate. I am not sure if it's very easy
> for the user to set the quota as a rate. For example, each partition
> mutation could take a different number of ZK operations (depending on
> things like retry). The time to process each ZK operation may also vary
> from cluster to cluster. An alternative way to model this is to do sth
> similar to the request (CPU) quota, which exposes quota as a percentage of
> the server threads that can be used. The current request quota doesn't
> include the controller thread. We could add something that measures/exposes
> the percentage of time that a request ties up the controller thread, which
> seems to be what we really care about.
>
> 2. Is the new quota per user? Intuitively, we want to only penalize
> applications that overuse the broker resources, but not others. Also, in
> existing types of quotas (request, bandwidth), there is a hierarchy among
> clientId vs user and default vs customized (see
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
> ). Does the new quota fit into the existing hierarchy?
>
> 3. It seems that you are proposing a new quota mechanism based on Token
> Bucket algorithm. Could you describe its tradeoff with the existing quota
> mechanism? Ideally, it would be better if we have a single quota mechanism
> within Kafka.
>
> Jun
>
>
>
>
> On Fri, Apr 24, 2020 at 9:52 AM David Jacot  wrote:
>
> > Hi folks,
> >
> > I'd like to start the discussion for KIP-599:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> >
> > It proposes to introduce quotas for the create topics, create partitions
> > and delete topics operations. Let me know what you think, thanks.
> >
> > Best,
> > David
> >
>

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-04-24 Thread Jun Rao

Hi, David,

Thanks for the KIP. A few quick comments.

1. About quota.partition.mutations.rate. I am not sure if it's very easy
for the user to set the quota as a rate. For example, each partition
mutation could take a different number of ZK operations (depending on
things like retry). The time to process each ZK operation may also vary
from cluster to cluster. An alternative way to model this is to do sth
similar to the request (CPU) quota, which exposes quota as a percentage of
the server threads that can be used. The current request quota doesn't
include the controller thread. We could add something that measures/exposes
the percentage of time that a request ties up the controller thread, which
seems to be what we really care about.

2. Is the new quota per user? Intuitively, we want to only penalize
applications that overuse the broker resources, but not others. Also, in
existing types of quotas (request, bandwidth), there is a hierarchy among
clientId vs user and default vs customized (see
https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
). Does the new quota fit into the existing hierarchy?

3. It seems that you are proposing a new quota mechanism based on Token
Bucket algorithm. Could you describe its tradeoff with the existing quota
mechanism? Ideally, it would be better if we have a single quota mechanism
within Kafka.

Jun

On Fri, Apr 24, 2020 at 9:52 AM David Jacot  wrote:

> Hi folks,
>
> I'd like to start the discussion for KIP-599:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
>
> It proposes to introduce quotas for the create topics, create partitions
> and delete topics operations. Let me know what you think, thanks.
>
> Best,
> David
>

KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-04-24 Thread David Jacot

Hi folks,

I'd like to start the discussion for KIP-599:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations

It proposes to introduce quotas for the create topics, create partitions
and delete topics operations. Let me know what you think, thanks.

Best,
David

60 matches

Mail list logo