Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi all, I would like to inform you that we have slightly changed our thoughts about the implementation of the Token Bucket algorithm. Our initial idea was to change our existing Rate to behave like a Token Bucket. That works as we expected but we have realized that the value of the Rate is not really a sampled Rate anymore. From an observability perspective, it is not good to change the behavior of our Rate. Instead, we propose to implement a new MeasurableStat that implements the Token Bucket algorithm and to use it alongside the Rate in the Sensor. Which this, the Rate remains there for observability purposes and the Token Bucket is used for enforcing the quota. The Token Bucket metrics is exposed via a new metric named "credits" that represents the remaining amount of credits in the bucket. As a first step, this will be only used for the controller quota. I have updated the KIP to reflect this change. We hope this change is fine for everyone. Please, raise your concerns if not. Best, David On Tue, Jul 28, 2020 at 1:48 PM David Jacot wrote: > Hi all, > > Just a quick update. We have made good progress regarding the > implementation > of this KIP. The major parts are already in trunk modulo the new "rate" > implementation > which is still under development. > > I would like to change the type of the `controller_mutations_rate` from a > Long to > a Double. I have made various experiments and the rate can be quite low, > especially > in clusters with a lot of tenants. Using a Long is quite limiting here to > fine tune small > rates. I hope that this change is fine for everyone. > > Best, > David > > On Mon, Jun 15, 2020 at 9:26 AM David Jacot wrote: > >> Hi all, >> >> The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and >> Colin) >> and 2 non-binding votes (Tom, Anna). >> >> Thank you all for the fruitful discussion! I'd like to particularly thank >> Anna who has >> heavily contributed to the design of this KIP. >> >> Regards, >> David >> >> On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe wrote: >> >>> +1. Thanks, David! >>> >>> best, >>> Colin >>> >>> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote: >>> > Colin, Jun, >>> > >>> > Do the proposed error code and the updated KIP look good to you guys? >>> I’d >>> > like to wrap up and close the vote. >>> > >>> > Thanks, >>> > David >>> > >>> > Le mer. 10 juin 2020 à 14:50, David Jacot a >>> écrit : >>> > >>> > > Hi Colin and Jun, >>> > > >>> > > I have no problem if we have to rewrite part of it when the new >>> controller >>> > > comes >>> > > out. I will be more than happy to help out. >>> > > >>> > > Regarding KIP-590, I think that we can cope with a principal as a >>> string >>> > > when the >>> > > time comes. The user entity name is defined with a string already. >>> > > >>> > > Regarding the name of the error, you have made a good point. I do >>> agree >>> > > that it >>> > > is important to differentiate the two cases. I propose the following >>> two >>> > > errors: >>> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than >>> rate as >>> > > we have quotas which are not rate (e.g. request quota). This one is >>> > > retryable >>> > > once the throttle time is passed. >>> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has >>> been >>> > > reached and is a final error. >>> > > We only need the former in this KIP. What do you think? >>> > > >>> > > Jun, I have added a few examples in the KIP. The new name works >>> exactly >>> > > like >>> > > the existing one once it is added to the accepted dynamic configs >>> for the >>> > > user >>> > > and the client entities. I have added a "Kafka Config Command" >>> chapter in >>> > > the >>> > > KIP. I will also open a Jira to not forget updating the AK >>> documentation >>> > > once >>> > > the KIP gets merged. >>> > > >>> > > Thanks, >>> > > David >>> > > >>> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: >>> > > >>> > >> Hi, Colin, >>> > >> >>> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this >>> clear. >>> > >> >>> > >> Hi, David, >>> > >> >>> > >> We added a new quota name in the KIP. You chose not to bump up the >>> version >>> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok >>> since >>> > >> the >>> > >> quota name is represented as a string. However, the new quota name >>> can be >>> > >> used in client tools for setting and listing the quota ( >>> > >> https://kafka.apache.org/documentation/#quotas). Could you >>> document how >>> > >> the >>> > >> new name will be used in those tools? >>> > >> >>> > >> Thanks, >>> > >> >>> > >> Jun >>> > >> >>> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe >>> wrote: >>> > >> >>> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: >>> > >> > > Hi Colin, >>> > >> > > >>> > >> > > Thank you for your feedback. >>> > >> > > >>> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I >>> would >>> > >> like to >>> > >> > > complement
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi all, Just a quick update. We have made good progress regarding the implementation of this KIP. The major parts are already in trunk modulo the new "rate" implementation which is still under development. I would like to change the type of the `controller_mutations_rate` from a Long to a Double. I have made various experiments and the rate can be quite low, especially in clusters with a lot of tenants. Using a Long is quite limiting here to fine tune small rates. I hope that this change is fine for everyone. Best, David On Mon, Jun 15, 2020 at 9:26 AM David Jacot wrote: > Hi all, > > The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and > Colin) > and 2 non-binding votes (Tom, Anna). > > Thank you all for the fruitful discussion! I'd like to particularly thank > Anna who has > heavily contributed to the design of this KIP. > > Regards, > David > > On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe wrote: > >> +1. Thanks, David! >> >> best, >> Colin >> >> On Thu, Jun 11, 2020, at 23:51, David Jacot wrote: >> > Colin, Jun, >> > >> > Do the proposed error code and the updated KIP look good to you guys? >> I’d >> > like to wrap up and close the vote. >> > >> > Thanks, >> > David >> > >> > Le mer. 10 juin 2020 à 14:50, David Jacot a >> écrit : >> > >> > > Hi Colin and Jun, >> > > >> > > I have no problem if we have to rewrite part of it when the new >> controller >> > > comes >> > > out. I will be more than happy to help out. >> > > >> > > Regarding KIP-590, I think that we can cope with a principal as a >> string >> > > when the >> > > time comes. The user entity name is defined with a string already. >> > > >> > > Regarding the name of the error, you have made a good point. I do >> agree >> > > that it >> > > is important to differentiate the two cases. I propose the following >> two >> > > errors: >> > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate >> as >> > > we have quotas which are not rate (e.g. request quota). This one is >> > > retryable >> > > once the throttle time is passed. >> > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has >> been >> > > reached and is a final error. >> > > We only need the former in this KIP. What do you think? >> > > >> > > Jun, I have added a few examples in the KIP. The new name works >> exactly >> > > like >> > > the existing one once it is added to the accepted dynamic configs for >> the >> > > user >> > > and the client entities. I have added a "Kafka Config Command" >> chapter in >> > > the >> > > KIP. I will also open a Jira to not forget updating the AK >> documentation >> > > once >> > > the KIP gets merged. >> > > >> > > Thanks, >> > > David >> > > >> > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: >> > > >> > >> Hi, Colin, >> > >> >> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this >> clear. >> > >> >> > >> Hi, David, >> > >> >> > >> We added a new quota name in the KIP. You chose not to bump up the >> version >> > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok >> since >> > >> the >> > >> quota name is represented as a string. However, the new quota name >> can be >> > >> used in client tools for setting and listing the quota ( >> > >> https://kafka.apache.org/documentation/#quotas). Could you document >> how >> > >> the >> > >> new name will be used in those tools? >> > >> >> > >> Thanks, >> > >> >> > >> Jun >> > >> >> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe >> wrote: >> > >> >> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: >> > >> > > Hi Colin, >> > >> > > >> > >> > > Thank you for your feedback. >> > >> > > >> > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would >> > >> like to >> > >> > > complement it with the following points: >> > >> > > >> > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the >> > >> topic >> > >> > > creations, partition creations and topics deletions that are >> exceeding >> > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will >> > >> > > be populated accordingly to let the client know how long it must >> wait. >> > >> > > >> > >> > > 2. I do agree that we actually want a mechanism to apply back >> pressure >> > >> > > to the clients. The KIP basically proposes a mechanism to >> control and >> > >> to >> > >> > > limit the rate of operations before entering the controller. I >> think >> > >> that >> > >> > > it is similar to your thinking but is enforced based on a defined >> > >> > > instead of relying on the number of pending items in the >> controller. >> > >> > > >> > >> > > 3. You mentioned an alternative idea in your comments that, if I >> > >> > understood >> > >> > > correctly, would bound the queue to limit the overload and reject >> > >> work if >> > >> > > the queue is full. I have been thinking about this as well but I >> don't >> > >> > think >> > >> > > that it works well in our case. >> > >> > > - The first reason is
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi all, The vote has passed with 5 binding votes (Gwen, Rajini, Mickael, Jun and Colin) and 2 non-binding votes (Tom, Anna). Thank you all for the fruitful discussion! I'd like to particularly thank Anna who has heavily contributed to the design of this KIP. Regards, David On Fri, Jun 12, 2020 at 10:08 PM Colin McCabe wrote: > +1. Thanks, David! > > best, > Colin > > On Thu, Jun 11, 2020, at 23:51, David Jacot wrote: > > Colin, Jun, > > > > Do the proposed error code and the updated KIP look good to you guys? I’d > > like to wrap up and close the vote. > > > > Thanks, > > David > > > > Le mer. 10 juin 2020 à 14:50, David Jacot a écrit > : > > > > > Hi Colin and Jun, > > > > > > I have no problem if we have to rewrite part of it when the new > controller > > > comes > > > out. I will be more than happy to help out. > > > > > > Regarding KIP-590, I think that we can cope with a principal as a > string > > > when the > > > time comes. The user entity name is defined with a string already. > > > > > > Regarding the name of the error, you have made a good point. I do agree > > > that it > > > is important to differentiate the two cases. I propose the following > two > > > errors: > > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate > as > > > we have quotas which are not rate (e.g. request quota). This one is > > > retryable > > > once the throttle time is passed. > > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has > been > > > reached and is a final error. > > > We only need the former in this KIP. What do you think? > > > > > > Jun, I have added a few examples in the KIP. The new name works exactly > > > like > > > the existing one once it is added to the accepted dynamic configs for > the > > > user > > > and the client entities. I have added a "Kafka Config Command" chapter > in > > > the > > > KIP. I will also open a Jira to not forget updating the AK > documentation > > > once > > > the KIP gets merged. > > > > > > Thanks, > > > David > > > > > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: > > > > > >> Hi, Colin, > > >> > > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this > clear. > > >> > > >> Hi, David, > > >> > > >> We added a new quota name in the KIP. You chose not to bump up the > version > > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok > since > > >> the > > >> quota name is represented as a string. However, the new quota name > can be > > >> used in client tools for setting and listing the quota ( > > >> https://kafka.apache.org/documentation/#quotas). Could you document > how > > >> the > > >> new name will be used in those tools? > > >> > > >> Thanks, > > >> > > >> Jun > > >> > > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe > wrote: > > >> > > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > > >> > > Hi Colin, > > >> > > > > >> > > Thank you for your feedback. > > >> > > > > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would > > >> like to > > >> > > complement it with the following points: > > >> > > > > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the > > >> topic > > >> > > creations, partition creations and topics deletions that are > exceeding > > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > > >> > > be populated accordingly to let the client know how long it must > wait. > > >> > > > > >> > > 2. I do agree that we actually want a mechanism to apply back > pressure > > >> > > to the clients. The KIP basically proposes a mechanism to control > and > > >> to > > >> > > limit the rate of operations before entering the controller. I > think > > >> that > > >> > > it is similar to your thinking but is enforced based on a defined > > >> > > instead of relying on the number of pending items in the > controller. > > >> > > > > >> > > 3. You mentioned an alternative idea in your comments that, if I > > >> > understood > > >> > > correctly, would bound the queue to limit the overload and reject > > >> work if > > >> > > the queue is full. I have been thinking about this as well but I > don't > > >> > think > > >> > > that it works well in our case. > > >> > > - The first reason is the one mentioned by Jun. We actually want > to be > > >> > able > > >> > > to limit specific clients (the misbehaving ones) in a multi-tenant > > >> > > environment. > > >> > > - The second reason is that, at least in our current > implementation, > > >> the > > >> > > length of the queue is not really a good characteristic to > estimate > > >> the > > >> > load. > > >> > > Coming back to your example of the CreateTopicsRequest. They > create > > >> path > > >> > > in ZK for each newly created topics which trigger a ChangeTopic > event > > >> > in > > >> > > the controller. That single event could be for a single topic in > some > > >> > cases or > > >> > > for a thousand topics in others. > > >> > > These two reasons aside, bounding
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1. Thanks, David! best, Colin On Thu, Jun 11, 2020, at 23:51, David Jacot wrote: > Colin, Jun, > > Do the proposed error code and the updated KIP look good to you guys? I’d > like to wrap up and close the vote. > > Thanks, > David > > Le mer. 10 juin 2020 à 14:50, David Jacot a écrit : > > > Hi Colin and Jun, > > > > I have no problem if we have to rewrite part of it when the new controller > > comes > > out. I will be more than happy to help out. > > > > Regarding KIP-590, I think that we can cope with a principal as a string > > when the > > time comes. The user entity name is defined with a string already. > > > > Regarding the name of the error, you have made a good point. I do agree > > that it > > is important to differentiate the two cases. I propose the following two > > errors: > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as > > we have quotas which are not rate (e.g. request quota). This one is > > retryable > > once the throttle time is passed. > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been > > reached and is a final error. > > We only need the former in this KIP. What do you think? > > > > Jun, I have added a few examples in the KIP. The new name works exactly > > like > > the existing one once it is added to the accepted dynamic configs for the > > user > > and the client entities. I have added a "Kafka Config Command" chapter in > > the > > KIP. I will also open a Jira to not forget updating the AK documentation > > once > > the KIP gets merged. > > > > Thanks, > > David > > > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: > > > >> Hi, Colin, > >> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear. > >> > >> Hi, David, > >> > >> We added a new quota name in the KIP. You chose not to bump up the version > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since > >> the > >> quota name is represented as a string. However, the new quota name can be > >> used in client tools for setting and listing the quota ( > >> https://kafka.apache.org/documentation/#quotas). Could you document how > >> the > >> new name will be used in those tools? > >> > >> Thanks, > >> > >> Jun > >> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe wrote: > >> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > >> > > Hi Colin, > >> > > > >> > > Thank you for your feedback. > >> > > > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would > >> like to > >> > > complement it with the following points: > >> > > > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the > >> topic > >> > > creations, partition creations and topics deletions that are exceeding > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > >> > > be populated accordingly to let the client know how long it must wait. > >> > > > >> > > 2. I do agree that we actually want a mechanism to apply back pressure > >> > > to the clients. The KIP basically proposes a mechanism to control and > >> to > >> > > limit the rate of operations before entering the controller. I think > >> that > >> > > it is similar to your thinking but is enforced based on a defined > >> > > instead of relying on the number of pending items in the controller. > >> > > > >> > > 3. You mentioned an alternative idea in your comments that, if I > >> > understood > >> > > correctly, would bound the queue to limit the overload and reject > >> work if > >> > > the queue is full. I have been thinking about this as well but I don't > >> > think > >> > > that it works well in our case. > >> > > - The first reason is the one mentioned by Jun. We actually want to be > >> > able > >> > > to limit specific clients (the misbehaving ones) in a multi-tenant > >> > > environment. > >> > > - The second reason is that, at least in our current implementation, > >> the > >> > > length of the queue is not really a good characteristic to estimate > >> the > >> > load. > >> > > Coming back to your example of the CreateTopicsRequest. They create > >> path > >> > > in ZK for each newly created topics which trigger a ChangeTopic event > >> > in > >> > > the controller. That single event could be for a single topic in some > >> > cases or > >> > > for a thousand topics in others. > >> > > These two reasons aside, bounding the queue also introduces a knob > >> which > >> > > requires some tuning and thus suffers from all the points you > >> mentioned > >> > as > >> > > well, isn't it? The value may be true for one version but not for > >> > another. > >> > > > >> > > 4. Regarding the integration with KIP-500. The implementation of this > >> KIP > >> > > will span across the ApiLayer and the AdminManager. To be honest, we > >> > > can influence the implementation to work best with what you have in > >> mind > >> > > for the future controller. If you could shed some more light on the > >> > future > >> > > internal architecture, I can
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for making those changes. They look fine to me. +1 Jun On Thu, Jun 11, 2020 at 11:51 PM David Jacot wrote: > Colin, Jun, > > Do the proposed error code and the updated KIP look good to you guys? I’d > like to wrap up and close the vote. > > Thanks, > David > > Le mer. 10 juin 2020 à 14:50, David Jacot a écrit : > > > Hi Colin and Jun, > > > > I have no problem if we have to rewrite part of it when the new > controller > > comes > > out. I will be more than happy to help out. > > > > Regarding KIP-590, I think that we can cope with a principal as a string > > when the > > time comes. The user entity name is defined with a string already. > > > > Regarding the name of the error, you have made a good point. I do agree > > that it > > is important to differentiate the two cases. I propose the following two > > errors: > > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as > > we have quotas which are not rate (e.g. request quota). This one is > > retryable > > once the throttle time is passed. > > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been > > reached and is a final error. > > We only need the former in this KIP. What do you think? > > > > Jun, I have added a few examples in the KIP. The new name works exactly > > like > > the existing one once it is added to the accepted dynamic configs for the > > user > > and the client entities. I have added a "Kafka Config Command" chapter in > > the > > KIP. I will also open a Jira to not forget updating the AK documentation > > once > > the KIP gets merged. > > > > Thanks, > > David > > > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: > > > >> Hi, Colin, > >> > >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this > clear. > >> > >> Hi, David, > >> > >> We added a new quota name in the KIP. You chose not to bump up the > version > >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since > >> the > >> quota name is represented as a string. However, the new quota name can > be > >> used in client tools for setting and listing the quota ( > >> https://kafka.apache.org/documentation/#quotas). Could you document how > >> the > >> new name will be used in those tools? > >> > >> Thanks, > >> > >> Jun > >> > >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe wrote: > >> > >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > >> > > Hi Colin, > >> > > > >> > > Thank you for your feedback. > >> > > > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would > >> like to > >> > > complement it with the following points: > >> > > > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the > >> topic > >> > > creations, partition creations and topics deletions that are > exceeding > >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > >> > > be populated accordingly to let the client know how long it must > wait. > >> > > > >> > > 2. I do agree that we actually want a mechanism to apply back > pressure > >> > > to the clients. The KIP basically proposes a mechanism to control > and > >> to > >> > > limit the rate of operations before entering the controller. I think > >> that > >> > > it is similar to your thinking but is enforced based on a defined > >> > > instead of relying on the number of pending items in the controller. > >> > > > >> > > 3. You mentioned an alternative idea in your comments that, if I > >> > understood > >> > > correctly, would bound the queue to limit the overload and reject > >> work if > >> > > the queue is full. I have been thinking about this as well but I > don't > >> > think > >> > > that it works well in our case. > >> > > - The first reason is the one mentioned by Jun. We actually want to > be > >> > able > >> > > to limit specific clients (the misbehaving ones) in a multi-tenant > >> > > environment. > >> > > - The second reason is that, at least in our current implementation, > >> the > >> > > length of the queue is not really a good characteristic to estimate > >> the > >> > load. > >> > > Coming back to your example of the CreateTopicsRequest. They create > >> path > >> > > in ZK for each newly created topics which trigger a ChangeTopic > event > >> > in > >> > > the controller. That single event could be for a single topic in > some > >> > cases or > >> > > for a thousand topics in others. > >> > > These two reasons aside, bounding the queue also introduces a knob > >> which > >> > > requires some tuning and thus suffers from all the points you > >> mentioned > >> > as > >> > > well, isn't it? The value may be true for one version but not for > >> > another. > >> > > > >> > > 4. Regarding the integration with KIP-500. The implementation of > this > >> KIP > >> > > will span across the ApiLayer and the AdminManager. To be honest, we > >> > > can influence the implementation to work best with what you have in > >> mind > >> > > for the future controller. If you could shed some more
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Colin, Jun, Do the proposed error code and the updated KIP look good to you guys? I’d like to wrap up and close the vote. Thanks, David Le mer. 10 juin 2020 à 14:50, David Jacot a écrit : > Hi Colin and Jun, > > I have no problem if we have to rewrite part of it when the new controller > comes > out. I will be more than happy to help out. > > Regarding KIP-590, I think that we can cope with a principal as a string > when the > time comes. The user entity name is defined with a string already. > > Regarding the name of the error, you have made a good point. I do agree > that it > is important to differentiate the two cases. I propose the following two > errors: > - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as > we have quotas which are not rate (e.g. request quota). This one is > retryable > once the throttle time is passed. > - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been > reached and is a final error. > We only need the former in this KIP. What do you think? > > Jun, I have added a few examples in the KIP. The new name works exactly > like > the existing one once it is added to the accepted dynamic configs for the > user > and the client entities. I have added a "Kafka Config Command" chapter in > the > KIP. I will also open a Jira to not forget updating the AK documentation > once > the KIP gets merged. > > Thanks, > David > > On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: > >> Hi, Colin, >> >> Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear. >> >> Hi, David, >> >> We added a new quota name in the KIP. You chose not to bump up the version >> of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since >> the >> quota name is represented as a string. However, the new quota name can be >> used in client tools for setting and listing the quota ( >> https://kafka.apache.org/documentation/#quotas). Could you document how >> the >> new name will be used in those tools? >> >> Thanks, >> >> Jun >> >> On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe wrote: >> >> > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: >> > > Hi Colin, >> > > >> > > Thank you for your feedback. >> > > >> > > Jun has summarized the situation pretty well. Thanks Jun! I would >> like to >> > > complement it with the following points: >> > > >> > > 1. Indeed, when the quota is exceeded, the broker will reject the >> topic >> > > creations, partition creations and topics deletions that are exceeding >> > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will >> > > be populated accordingly to let the client know how long it must wait. >> > > >> > > 2. I do agree that we actually want a mechanism to apply back pressure >> > > to the clients. The KIP basically proposes a mechanism to control and >> to >> > > limit the rate of operations before entering the controller. I think >> that >> > > it is similar to your thinking but is enforced based on a defined >> > > instead of relying on the number of pending items in the controller. >> > > >> > > 3. You mentioned an alternative idea in your comments that, if I >> > understood >> > > correctly, would bound the queue to limit the overload and reject >> work if >> > > the queue is full. I have been thinking about this as well but I don't >> > think >> > > that it works well in our case. >> > > - The first reason is the one mentioned by Jun. We actually want to be >> > able >> > > to limit specific clients (the misbehaving ones) in a multi-tenant >> > > environment. >> > > - The second reason is that, at least in our current implementation, >> the >> > > length of the queue is not really a good characteristic to estimate >> the >> > load. >> > > Coming back to your example of the CreateTopicsRequest. They create >> path >> > > in ZK for each newly created topics which trigger a ChangeTopic event >> > in >> > > the controller. That single event could be for a single topic in some >> > cases or >> > > for a thousand topics in others. >> > > These two reasons aside, bounding the queue also introduces a knob >> which >> > > requires some tuning and thus suffers from all the points you >> mentioned >> > as >> > > well, isn't it? The value may be true for one version but not for >> > another. >> > > >> > > 4. Regarding the integration with KIP-500. The implementation of this >> KIP >> > > will span across the ApiLayer and the AdminManager. To be honest, we >> > > can influence the implementation to work best with what you have in >> mind >> > > for the future controller. If you could shed some more light on the >> > future >> > > internal architecture, I can take this into account during the >> > > implementation. >> > > >> > >> > Hi David, >> > >> > Good question. The approach we are thinking of is that in the future, >> > topic creation will be a controller RPC. In other words, rather than >> > changing ZK and waiting for the controller code to notice, we'll go >> through >> > the controller code
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Colin and Jun, I have no problem if we have to rewrite part of it when the new controller comes out. I will be more than happy to help out. Regarding KIP-590, I think that we can cope with a principal as a string when the time comes. The user entity name is defined with a string already. Regarding the name of the error, you have made a good point. I do agree that it is important to differentiate the two cases. I propose the following two errors: - THROTTLING_QUOTA_EXCEEDED - Throttling is slightly better than rate as we have quotas which are not rate (e.g. request quota). This one is retryable once the throttle time is passed. - LIMIT_QUOTA_EXCEEDED - This one would indicate that the limit has been reached and is a final error. We only need the former in this KIP. What do you think? Jun, I have added a few examples in the KIP. The new name works exactly like the existing one once it is added to the accepted dynamic configs for the user and the client entities. I have added a "Kafka Config Command" chapter in the KIP. I will also open a Jira to not forget updating the AK documentation once the KIP gets merged. Thanks, David On Wed, Jun 10, 2020 at 3:03 AM Jun Rao wrote: > Hi, Colin, > > Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear. > > Hi, David, > > We added a new quota name in the KIP. You chose not to bump up the version > of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since the > quota name is represented as a string. However, the new quota name can be > used in client tools for setting and listing the quota ( > https://kafka.apache.org/documentation/#quotas). Could you document how > the > new name will be used in those tools? > > Thanks, > > Jun > > On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe wrote: > > > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > > > Hi Colin, > > > > > > Thank you for your feedback. > > > > > > Jun has summarized the situation pretty well. Thanks Jun! I would like > to > > > complement it with the following points: > > > > > > 1. Indeed, when the quota is exceeded, the broker will reject the topic > > > creations, partition creations and topics deletions that are exceeding > > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > > > be populated accordingly to let the client know how long it must wait. > > > > > > 2. I do agree that we actually want a mechanism to apply back pressure > > > to the clients. The KIP basically proposes a mechanism to control and > to > > > limit the rate of operations before entering the controller. I think > that > > > it is similar to your thinking but is enforced based on a defined > > > instead of relying on the number of pending items in the controller. > > > > > > 3. You mentioned an alternative idea in your comments that, if I > > understood > > > correctly, would bound the queue to limit the overload and reject work > if > > > the queue is full. I have been thinking about this as well but I don't > > think > > > that it works well in our case. > > > - The first reason is the one mentioned by Jun. We actually want to be > > able > > > to limit specific clients (the misbehaving ones) in a multi-tenant > > > environment. > > > - The second reason is that, at least in our current implementation, > the > > > length of the queue is not really a good characteristic to estimate the > > load. > > > Coming back to your example of the CreateTopicsRequest. They create > path > > > in ZK for each newly created topics which trigger a ChangeTopic event > > in > > > the controller. That single event could be for a single topic in some > > cases or > > > for a thousand topics in others. > > > These two reasons aside, bounding the queue also introduces a knob > which > > > requires some tuning and thus suffers from all the points you mentioned > > as > > > well, isn't it? The value may be true for one version but not for > > another. > > > > > > 4. Regarding the integration with KIP-500. The implementation of this > KIP > > > will span across the ApiLayer and the AdminManager. To be honest, we > > > can influence the implementation to work best with what you have in > mind > > > for the future controller. If you could shed some more light on the > > future > > > internal architecture, I can take this into account during the > > > implementation. > > > > > > > Hi David, > > > > Good question. The approach we are thinking of is that in the future, > > topic creation will be a controller RPC. In other words, rather than > > changing ZK and waiting for the controller code to notice, we'll go > through > > the controller code (which may change ZK, or may do something else in the > > ZK-free environment). > > > > I don't think there is a good way to write this in the short term that > > avoids having to rewrite in the long term. That's probably OK though, as > > long as we keep in mind that we'll need to. > > > > > > > > 5. Regarding KIP-590. For the create topics request, create
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, Colin, Good point. Maybe sth like THROTTLING_QUOTA_VIOLATED will make this clear. Hi, David, We added a new quota name in the KIP. You chose not to bump up the version of DESCRIBE_CLIENT_QUOTAS and ALTER_CLIENT_QUOTAS, which seems ok since the quota name is represented as a string. However, the new quota name can be used in client tools for setting and listing the quota ( https://kafka.apache.org/documentation/#quotas). Could you document how the new name will be used in those tools? Thanks, Jun On Tue, Jun 9, 2020 at 3:37 PM Colin McCabe wrote: > On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > > Hi Colin, > > > > Thank you for your feedback. > > > > Jun has summarized the situation pretty well. Thanks Jun! I would like to > > complement it with the following points: > > > > 1. Indeed, when the quota is exceeded, the broker will reject the topic > > creations, partition creations and topics deletions that are exceeding > > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > > be populated accordingly to let the client know how long it must wait. > > > > 2. I do agree that we actually want a mechanism to apply back pressure > > to the clients. The KIP basically proposes a mechanism to control and to > > limit the rate of operations before entering the controller. I think that > > it is similar to your thinking but is enforced based on a defined > > instead of relying on the number of pending items in the controller. > > > > 3. You mentioned an alternative idea in your comments that, if I > understood > > correctly, would bound the queue to limit the overload and reject work if > > the queue is full. I have been thinking about this as well but I don't > think > > that it works well in our case. > > - The first reason is the one mentioned by Jun. We actually want to be > able > > to limit specific clients (the misbehaving ones) in a multi-tenant > > environment. > > - The second reason is that, at least in our current implementation, the > > length of the queue is not really a good characteristic to estimate the > load. > > Coming back to your example of the CreateTopicsRequest. They create path > > in ZK for each newly created topics which trigger a ChangeTopic event > in > > the controller. That single event could be for a single topic in some > cases or > > for a thousand topics in others. > > These two reasons aside, bounding the queue also introduces a knob which > > requires some tuning and thus suffers from all the points you mentioned > as > > well, isn't it? The value may be true for one version but not for > another. > > > > 4. Regarding the integration with KIP-500. The implementation of this KIP > > will span across the ApiLayer and the AdminManager. To be honest, we > > can influence the implementation to work best with what you have in mind > > for the future controller. If you could shed some more light on the > future > > internal architecture, I can take this into account during the > > implementation. > > > > Hi David, > > Good question. The approach we are thinking of is that in the future, > topic creation will be a controller RPC. In other words, rather than > changing ZK and waiting for the controller code to notice, we'll go through > the controller code (which may change ZK, or may do something else in the > ZK-free environment). > > I don't think there is a good way to write this in the short term that > avoids having to rewrite in the long term. That's probably OK though, as > long as we keep in mind that we'll need to. > > > > > 5. Regarding KIP-590. For the create topics request, create partitions > > request, and delete topics request, we are lucky enough to have directed > > all of them to the controller already so we are fine with doing the > admission > > in the broker which hosts the controller. If we were to throttle more > operations > > in the future, > > I believe that we can continue to do it centrally while leveraging the > > principal that is forwarded to account for the right tenant. The reason > > why I would like to keep it central is that we are talking about sparse > (or bursty) > > workloads here so distributing the quota among all the brokers makes > little sense. > > > > Right. The main requirement here is that the quota must operate based on > principal names rather than KafkaPrincipal objects. We had a long > discussion about KIP-590 and eventually concluded that we didn't want to > make KafkaPrincipal serializable (at least not yet) so what would be > forwarded is just a string, not the principal object. > > > > > 6. Regarding the naming of the new error code. BUSY sounds too generic to > > me so I would rather prefer a specific error code. The main reason being > > that we may be able to reuse it in the future for other quotas. That > being said, > > we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly > about > > the naming. I wonder what the others think about this. > > > > Think about if you're
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > Hi Colin, > > Thank you for your feedback. > > Jun has summarized the situation pretty well. Thanks Jun! I would like to > complement it with the following points: > > 1. Indeed, when the quota is exceeded, the broker will reject the topic > creations, partition creations and topics deletions that are exceeding > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > be populated accordingly to let the client know how long it must wait. > > 2. I do agree that we actually want a mechanism to apply back pressure > to the clients. The KIP basically proposes a mechanism to control and to > limit the rate of operations before entering the controller. I think that > it is similar to your thinking but is enforced based on a defined > instead of relying on the number of pending items in the controller. > > 3. You mentioned an alternative idea in your comments that, if I understood > correctly, would bound the queue to limit the overload and reject work if > the queue is full. I have been thinking about this as well but I don't think > that it works well in our case. > - The first reason is the one mentioned by Jun. We actually want to be able > to limit specific clients (the misbehaving ones) in a multi-tenant > environment. > - The second reason is that, at least in our current implementation, the > length of the queue is not really a good characteristic to estimate the load. > Coming back to your example of the CreateTopicsRequest. They create path > in ZK for each newly created topics which trigger a ChangeTopic event in > the controller. That single event could be for a single topic in some cases or > for a thousand topics in others. > These two reasons aside, bounding the queue also introduces a knob which > requires some tuning and thus suffers from all the points you mentioned as > well, isn't it? The value may be true for one version but not for another. > > 4. Regarding the integration with KIP-500. The implementation of this KIP > will span across the ApiLayer and the AdminManager. To be honest, we > can influence the implementation to work best with what you have in mind > for the future controller. If you could shed some more light on the future > internal architecture, I can take this into account during the > implementation. > Hi David, Good question. The approach we are thinking of is that in the future, topic creation will be a controller RPC. In other words, rather than changing ZK and waiting for the controller code to notice, we'll go through the controller code (which may change ZK, or may do something else in the ZK-free environment). I don't think there is a good way to write this in the short term that avoids having to rewrite in the long term. That's probably OK though, as long as we keep in mind that we'll need to. > > 5. Regarding KIP-590. For the create topics request, create partitions > request, and delete topics request, we are lucky enough to have directed > all of them to the controller already so we are fine with doing the admission > in the broker which hosts the controller. If we were to throttle more > operations > in the future, > I believe that we can continue to do it centrally while leveraging the > principal that is forwarded to account for the right tenant. The reason > why I would like to keep it central is that we are talking about sparse (or > bursty) > workloads here so distributing the quota among all the brokers makes little > sense. > Right. The main requirement here is that the quota must operate based on principal names rather than KafkaPrincipal objects. We had a long discussion about KIP-590 and eventually concluded that we didn't want to make KafkaPrincipal serializable (at least not yet) so what would be forwarded is just a string, not the principal object. > > 6. Regarding the naming of the new error code. BUSY sounds too generic to > me so I would rather prefer a specific error code. The main reason being > that we may be able to reuse it in the future for other quotas. That being > said, > we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about > the naming. I wonder what the others think about this. > Think about if you're someone writing software that uses the Kafka client. Let's say you try to create some topics and get back QUOTA_VIOLATED. What does it mean? Maybe it means that you tried to create too many topics in too short a time (violating the controller throttle). Or maybe it means that you exceeded your quota specifying the number of partitions that they are allowed to create (let's assume that someone did a follow-on KIP for that that reuses this error code for that.) But now you have a dilemma. If the controller was just busy, then trying again is the right thing to do. If there is some other quota you violated (number of partitions, number of topics, etc.) then retrying is wasteful and will not accomplish
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Sounds good then. Thanks, Jun On Tue, Jun 9, 2020 at 10:59 AM David Jacot wrote: > Hi Jun, > > Both are already in the KIP, see "New Broker Configurations" chapter. I > think > that we need them in order to be able to define different burst for the new > quota. > > Best, > David > > On Tue, Jun 9, 2020 at 7:48 PM Jun Rao wrote: > > > Hi, David, > > > > Another thing. Should we add controller.quota.window.size.seconds and > > controller.quota.window.num > > or just reuse the existing quota.window.size.seconds and quota.window.num > > that are used for other types of quotas? > > > > Thanks, > > > > Jun > > > > On Tue, Jun 9, 2020 at 10:30 AM Jun Rao wrote: > > > > > Hi, David, > > > > > > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 > > on > > > the KIP. > > > > > > Jun > > > > > > On Tue, Jun 9, 2020 at 5:07 AM David Jacot > wrote: > > > > > >> Hi Colin, > > >> > > >> Thank you for your feedback. > > >> > > >> Jun has summarized the situation pretty well. Thanks Jun! I would like > > to > > >> complement it with the following points: > > >> > > >> 1. Indeed, when the quota is exceeded, the broker will reject the > topic > > >> creations, partition creations and topics deletions that are exceeding > > >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > > >> be populated accordingly to let the client know how long it must wait. > > >> > > >> 2. I do agree that we actually want a mechanism to apply back pressure > > >> to the clients. The KIP basically proposes a mechanism to control and > to > > >> limit the rate of operations before entering the controller. I think > > that > > >> it is > > >> similar to your thinking but is enforced based on a defined quota > > instead > > >> of relying on the number of pending items in the controller. > > >> > > >> 3. You mentioned an alternative idea in your comments that, if I > > >> understood > > >> correctly, would bound the queue to limit the overload and reject work > > if > > >> the > > >> queue is full. I have been thinking about this as well but I don't > think > > >> that it > > >> works well in our case. > > >> - The first reason is the one mentioned by Jun. We actually want to be > > >> able > > >> to limit specific clients (the misbehaving ones) in a multi-tenant > > >> environment. > > >> - The second reason is that, at least in our current implementation, > the > > >> length of > > >> the queue is not really a good characteristic to estimate the load. > > >> Coming back > > >> to your example of the CreateTopicsRequest. They create path in ZK for > > >> each > > >> newly created topics which trigger a ChangeTopic event in the > > controller. > > >> That > > >> single event could be for a single topic in some cases or for a > thousand > > >> topics > > >> in others. > > >> These two reasons aside, bounding the queue also introduces a knob > which > > >> requires some tuning and thus suffers from all the points you > mentioned > > as > > >> well, isn't it? The value may be true for one version but not for > > another. > > >> > > >> 4. Regarding the integration with KIP-500. The implementation of this > > KIP > > >> will span across the ApiLayer and the AdminManager. To be honest, we > > >> can influence the implementation to work best with what you have in > mind > > >> for the future controller. If you could shed some more light on the > > future > > >> internal architecture, I can take this into account during the > > >> implementation. > > >> > > >> 5. Regarding KIP-590. For the create topics request, create partitions > > >> request, > > >> and delete topics request, we are lucky enough to have directed all of > > >> them > > >> to > > >> the controller already so we are fine with doing the admission in the > > >> broker > > >> which hosts the controller. If we were to throttle more operations in > > the > > >> future, > > >> I believe that we can continue to do it centrally while leveraging the > > >> principal > > >> that is forwarded to account for the right tenant. The reason why I > > would > > >> like > > >> to keep it central is that we are talking about sparse (or bursty) > > >> workloads here > > >> so distributing the quota among all the brokers makes little sense. > > >> > > >> 6. Regarding the naming of the new error code. BUSY sounds too generic > > to > > >> me so I would rather prefer a specific error code. The main reason > being > > >> that > > >> we may be able to reuse it in the future for other quotas. That being > > >> said, > > >> we > > >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly > about > > >> the naming. I wonder what the others think about this. > > >> > > >> Voilà. I hope that I have addressed all your questions/points and I am > > >> sorry for > > >> the lengthy email. > > >> > > >> Regards, > > >> David > > >> > > >> > > >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe > wrote: > > >> > > >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, Both are already in the KIP, see "New Broker Configurations" chapter. I think that we need them in order to be able to define different burst for the new quota. Best, David On Tue, Jun 9, 2020 at 7:48 PM Jun Rao wrote: > Hi, David, > > Another thing. Should we add controller.quota.window.size.seconds and > controller.quota.window.num > or just reuse the existing quota.window.size.seconds and quota.window.num > that are used for other types of quotas? > > Thanks, > > Jun > > On Tue, Jun 9, 2020 at 10:30 AM Jun Rao wrote: > > > Hi, David, > > > > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 > on > > the KIP. > > > > Jun > > > > On Tue, Jun 9, 2020 at 5:07 AM David Jacot wrote: > > > >> Hi Colin, > >> > >> Thank you for your feedback. > >> > >> Jun has summarized the situation pretty well. Thanks Jun! I would like > to > >> complement it with the following points: > >> > >> 1. Indeed, when the quota is exceeded, the broker will reject the topic > >> creations, partition creations and topics deletions that are exceeding > >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > >> be populated accordingly to let the client know how long it must wait. > >> > >> 2. I do agree that we actually want a mechanism to apply back pressure > >> to the clients. The KIP basically proposes a mechanism to control and to > >> limit the rate of operations before entering the controller. I think > that > >> it is > >> similar to your thinking but is enforced based on a defined quota > instead > >> of relying on the number of pending items in the controller. > >> > >> 3. You mentioned an alternative idea in your comments that, if I > >> understood > >> correctly, would bound the queue to limit the overload and reject work > if > >> the > >> queue is full. I have been thinking about this as well but I don't think > >> that it > >> works well in our case. > >> - The first reason is the one mentioned by Jun. We actually want to be > >> able > >> to limit specific clients (the misbehaving ones) in a multi-tenant > >> environment. > >> - The second reason is that, at least in our current implementation, the > >> length of > >> the queue is not really a good characteristic to estimate the load. > >> Coming back > >> to your example of the CreateTopicsRequest. They create path in ZK for > >> each > >> newly created topics which trigger a ChangeTopic event in the > controller. > >> That > >> single event could be for a single topic in some cases or for a thousand > >> topics > >> in others. > >> These two reasons aside, bounding the queue also introduces a knob which > >> requires some tuning and thus suffers from all the points you mentioned > as > >> well, isn't it? The value may be true for one version but not for > another. > >> > >> 4. Regarding the integration with KIP-500. The implementation of this > KIP > >> will span across the ApiLayer and the AdminManager. To be honest, we > >> can influence the implementation to work best with what you have in mind > >> for the future controller. If you could shed some more light on the > future > >> internal architecture, I can take this into account during the > >> implementation. > >> > >> 5. Regarding KIP-590. For the create topics request, create partitions > >> request, > >> and delete topics request, we are lucky enough to have directed all of > >> them > >> to > >> the controller already so we are fine with doing the admission in the > >> broker > >> which hosts the controller. If we were to throttle more operations in > the > >> future, > >> I believe that we can continue to do it centrally while leveraging the > >> principal > >> that is forwarded to account for the right tenant. The reason why I > would > >> like > >> to keep it central is that we are talking about sparse (or bursty) > >> workloads here > >> so distributing the quota among all the brokers makes little sense. > >> > >> 6. Regarding the naming of the new error code. BUSY sounds too generic > to > >> me so I would rather prefer a specific error code. The main reason being > >> that > >> we may be able to reuse it in the future for other quotas. That being > >> said, > >> we > >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about > >> the naming. I wonder what the others think about this. > >> > >> Voilà. I hope that I have addressed all your questions/points and I am > >> sorry for > >> the lengthy email. > >> > >> Regards, > >> David > >> > >> > >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe wrote: > >> > >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: > >> > > Hi, Colin, > >> > > > >> > > Thanks for the comment. You brought up several points. > >> > > > >> > > 1. Should we set up a per user quota? To me, it does seem we need > some > >> > sort > >> > > of a quota. When the controller runs out of resources, ideally, we > >> only > >> > > want to penalize the bad behaving applications, instead of every > >> > > application. To do that,
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Another thing. Should we add controller.quota.window.size.seconds and controller.quota.window.num or just reuse the existing quota.window.size.seconds and quota.window.num that are used for other types of quotas? Thanks, Jun On Tue, Jun 9, 2020 at 10:30 AM Jun Rao wrote: > Hi, David, > > Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on > the KIP. > > Jun > > On Tue, Jun 9, 2020 at 5:07 AM David Jacot wrote: > >> Hi Colin, >> >> Thank you for your feedback. >> >> Jun has summarized the situation pretty well. Thanks Jun! I would like to >> complement it with the following points: >> >> 1. Indeed, when the quota is exceeded, the broker will reject the topic >> creations, partition creations and topics deletions that are exceeding >> with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will >> be populated accordingly to let the client know how long it must wait. >> >> 2. I do agree that we actually want a mechanism to apply back pressure >> to the clients. The KIP basically proposes a mechanism to control and to >> limit the rate of operations before entering the controller. I think that >> it is >> similar to your thinking but is enforced based on a defined quota instead >> of relying on the number of pending items in the controller. >> >> 3. You mentioned an alternative idea in your comments that, if I >> understood >> correctly, would bound the queue to limit the overload and reject work if >> the >> queue is full. I have been thinking about this as well but I don't think >> that it >> works well in our case. >> - The first reason is the one mentioned by Jun. We actually want to be >> able >> to limit specific clients (the misbehaving ones) in a multi-tenant >> environment. >> - The second reason is that, at least in our current implementation, the >> length of >> the queue is not really a good characteristic to estimate the load. >> Coming back >> to your example of the CreateTopicsRequest. They create path in ZK for >> each >> newly created topics which trigger a ChangeTopic event in the controller. >> That >> single event could be for a single topic in some cases or for a thousand >> topics >> in others. >> These two reasons aside, bounding the queue also introduces a knob which >> requires some tuning and thus suffers from all the points you mentioned as >> well, isn't it? The value may be true for one version but not for another. >> >> 4. Regarding the integration with KIP-500. The implementation of this KIP >> will span across the ApiLayer and the AdminManager. To be honest, we >> can influence the implementation to work best with what you have in mind >> for the future controller. If you could shed some more light on the future >> internal architecture, I can take this into account during the >> implementation. >> >> 5. Regarding KIP-590. For the create topics request, create partitions >> request, >> and delete topics request, we are lucky enough to have directed all of >> them >> to >> the controller already so we are fine with doing the admission in the >> broker >> which hosts the controller. If we were to throttle more operations in the >> future, >> I believe that we can continue to do it centrally while leveraging the >> principal >> that is forwarded to account for the right tenant. The reason why I would >> like >> to keep it central is that we are talking about sparse (or bursty) >> workloads here >> so distributing the quota among all the brokers makes little sense. >> >> 6. Regarding the naming of the new error code. BUSY sounds too generic to >> me so I would rather prefer a specific error code. The main reason being >> that >> we may be able to reuse it in the future for other quotas. That being >> said, >> we >> can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about >> the naming. I wonder what the others think about this. >> >> Voilà. I hope that I have addressed all your questions/points and I am >> sorry for >> the lengthy email. >> >> Regards, >> David >> >> >> On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe wrote: >> >> > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: >> > > Hi, Colin, >> > > >> > > Thanks for the comment. You brought up several points. >> > > >> > > 1. Should we set up a per user quota? To me, it does seem we need some >> > sort >> > > of a quota. When the controller runs out of resources, ideally, we >> only >> > > want to penalize the bad behaving applications, instead of every >> > > application. To do that, we will need to know what each application is >> > > entitled to and the per user quota is intended to capture that. >> > > >> > > 2. How easy is it to configure a quota? The following is how an admin >> > > typically sets up a quota in our existing systems. Pick a generous >> > default >> > > per user quota works for most applications. For the few resource >> > intensive >> > > applications, customize a higher quota for them. Reserve enough >> resources >> > > in anticipation that a single
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the KIP. The name QUOTA_VIOLATED sounds reasonable to me. +1 on the KIP. Jun On Tue, Jun 9, 2020 at 5:07 AM David Jacot wrote: > Hi Colin, > > Thank you for your feedback. > > Jun has summarized the situation pretty well. Thanks Jun! I would like to > complement it with the following points: > > 1. Indeed, when the quota is exceeded, the broker will reject the topic > creations, partition creations and topics deletions that are exceeding > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > be populated accordingly to let the client know how long it must wait. > > 2. I do agree that we actually want a mechanism to apply back pressure > to the clients. The KIP basically proposes a mechanism to control and to > limit the rate of operations before entering the controller. I think that > it is > similar to your thinking but is enforced based on a defined quota instead > of relying on the number of pending items in the controller. > > 3. You mentioned an alternative idea in your comments that, if I understood > correctly, would bound the queue to limit the overload and reject work if > the > queue is full. I have been thinking about this as well but I don't think > that it > works well in our case. > - The first reason is the one mentioned by Jun. We actually want to be able > to limit specific clients (the misbehaving ones) in a multi-tenant > environment. > - The second reason is that, at least in our current implementation, the > length of > the queue is not really a good characteristic to estimate the load. > Coming back > to your example of the CreateTopicsRequest. They create path in ZK for each > newly created topics which trigger a ChangeTopic event in the controller. > That > single event could be for a single topic in some cases or for a thousand > topics > in others. > These two reasons aside, bounding the queue also introduces a knob which > requires some tuning and thus suffers from all the points you mentioned as > well, isn't it? The value may be true for one version but not for another. > > 4. Regarding the integration with KIP-500. The implementation of this KIP > will span across the ApiLayer and the AdminManager. To be honest, we > can influence the implementation to work best with what you have in mind > for the future controller. If you could shed some more light on the future > internal architecture, I can take this into account during the > implementation. > > 5. Regarding KIP-590. For the create topics request, create partitions > request, > and delete topics request, we are lucky enough to have directed all of them > to > the controller already so we are fine with doing the admission in the > broker > which hosts the controller. If we were to throttle more operations in the > future, > I believe that we can continue to do it centrally while leveraging the > principal > that is forwarded to account for the right tenant. The reason why I would > like > to keep it central is that we are talking about sparse (or bursty) > workloads here > so distributing the quota among all the brokers makes little sense. > > 6. Regarding the naming of the new error code. BUSY sounds too generic to > me so I would rather prefer a specific error code. The main reason being > that > we may be able to reuse it in the future for other quotas. That being said, > we > can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about > the naming. I wonder what the others think about this. > > Voilà. I hope that I have addressed all your questions/points and I am > sorry for > the lengthy email. > > Regards, > David > > > On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe wrote: > > > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: > > > Hi, Colin, > > > > > > Thanks for the comment. You brought up several points. > > > > > > 1. Should we set up a per user quota? To me, it does seem we need some > > sort > > > of a quota. When the controller runs out of resources, ideally, we only > > > want to penalize the bad behaving applications, instead of every > > > application. To do that, we will need to know what each application is > > > entitled to and the per user quota is intended to capture that. > > > > > > 2. How easy is it to configure a quota? The following is how an admin > > > typically sets up a quota in our existing systems. Pick a generous > > default > > > per user quota works for most applications. For the few resource > > intensive > > > applications, customize a higher quota for them. Reserve enough > resources > > > in anticipation that a single (or a few) application will exceed the > > quota > > > at a given time. > > > > > > > Hi Jun, > > > > Thanks for the response. > > > > Maybe I was too pessimistic about the ability of admins to configure a > > useful quota here. I do agree that it would be nice to have the ability > to > > set different quotas for different users, as you mentioned. > > > > > > > > 3. How should the quota be defined? In the discussion
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the explanation. The KIP looks good to me now. Jun On Tue, Jun 9, 2020 at 4:27 AM David Jacot wrote: > Hi Jun, > > 40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated. > This > is required to let the client know how long it must wait. This is explained > in the > "Handling of new/old clients". > > Best, > David > > On Mon, Jun 8, 2020 at 9:29 PM Jun Rao wrote: > > > Hi, David, > > > > Thanks for the updated KIP. Another minor comment below. > > > > 40. For the new `QUOTA_VIOLATED` error in the response to > > CreateTopics/CreatePartitions/DeleteTopics, could you clarify > > whether ThrottleTimeMs is set when the error code is set to > QUOTA_VIOLATED? > > > > Jun > > > > On Mon, Jun 8, 2020 at 9:32 AM David Jacot wrote: > > > > > Hi Jun, > > > > > > 30. The rate is accumulated at the partition level. Let me clarify this > > in > > > the KIP. > > > > > > Best, > > > David > > > > > > On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner wrote: > > > > > > > Hi David, > > > > > > > > The KIP looks good to me. I am going to the voting thread... > > > > > > > > Hi Jun, > > > > > > > > Yes, exactly. That's a separate thing from this KIP, so working on > the > > > fix. > > > > > > > > Thanks, > > > > Anna > > > > > > > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote: > > > > > > > > > Hi, Anna, > > > > > > > > > > Thanks for the comment. For the problem that you described, perhaps > > we > > > > need > > > > > to make the quota checking and recording more atomic? > > > > > > > > > > Hi, David, > > > > > > > > > > Thanks for the updated KIP. Looks good to me now. Just one minor > > > comment > > > > > below. > > > > > > > > > > 30. controller_mutations_rate: For topic creation and deletion, is > > the > > > > rate > > > > > accumulated at the topic or partition level? It would be useful to > > make > > > > it > > > > > clear in the wiki. > > > > > > > > > > Jun > > > > > > > > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot > > > wrote: > > > > > > > > > > > Hi Anna and Jun, > > > > > > > > > > > > You are right. We should allocate up to the quota for each old > > > sample. > > > > > > > > > > > > I have revamped the Throttling Algorithm section to better > explain > > > our > > > > > > thought process and the token bucket inspiration. > > > > > > > > > > > > I have also added a chapter with few guidelines about how to > define > > > > > > the quota. There is no magic formula for this but I give few > > > insights. > > > > > > I don't have specific numbers that can be used out of the box so > I > > > > > > think that it is better to not put any for the time being. We can > > > > always > > > > > > complement later on in the documentation. > > > > > > > > > > > > Please, take a look and let me know what you think. > > > > > > > > > > > > Cheers, > > > > > > David > > > > > > > > > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner > > > wrote: > > > > > > > > > > > > > Hi David and Jun, > > > > > > > > > > > > > > I dug a bit deeper into the Rate implementation, and wanted to > > > > confirm > > > > > > that > > > > > > > I do believe that the token bucket behavior is better for the > > > reasons > > > > > we > > > > > > > already discussed but wanted to summarize. The main difference > > > > between > > > > > > Rate > > > > > > > and token bucket is that the Rate implementation allows a burst > > by > > > > > > > borrowing from the future, whereas a token bucket allows a > burst > > by > > > > > using > > > > > > > accumulated tokens from the previous idle period. Using > > accumulated > > > > > > tokens > > > > > > > smoothes out the rate measurement in general. Configuring a > large > > > > burst > > > > > > > requires configuring a large quota window, which causes long > > delays > > > > for > > > > > > > bursty workload, due to borrowing credits from the future. > > Perhaps > > > it > > > > > is > > > > > > > useful to add a summary in the beginning of the Throttling > > > Algorithm > > > > > > > section? > > > > > > > > > > > > > > In my previous email, I mentioned the issue we observed with > the > > > > > > bandwidth > > > > > > > quota, where a low quota (1MB/s per broker) was limiting > > bandwidth > > > > > > visibly > > > > > > > below the quota. I thought it was strictly the issue with the > > Rate > > > > > > > implementation as well, but I found a root cause to be > different > > > but > > > > > > > amplified by the Rate implementation (long throttle delays of > > > > requests > > > > > > in a > > > > > > > burst). I will describe it here for completeness using the > > > following > > > > > > > example: > > > > > > > > > > > > > >- > > > > > > > > > > > > > >Quota = 1MB/s, default window size and number of samples > > > > > > >- > > > > > > > > > > > > > >Suppose there are 6 connections (maximum 6 outstanding > > > requests), > > > > > and > > > > > > >each produce request is 5MB. If all requests arrive in a > > burst, > > > > the > > >
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Colin, Thank you for your feedback. Jun has summarized the situation pretty well. Thanks Jun! I would like to complement it with the following points: 1. Indeed, when the quota is exceeded, the broker will reject the topic creations, partition creations and topics deletions that are exceeding with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will be populated accordingly to let the client know how long it must wait. 2. I do agree that we actually want a mechanism to apply back pressure to the clients. The KIP basically proposes a mechanism to control and to limit the rate of operations before entering the controller. I think that it is similar to your thinking but is enforced based on a defined quota instead of relying on the number of pending items in the controller. 3. You mentioned an alternative idea in your comments that, if I understood correctly, would bound the queue to limit the overload and reject work if the queue is full. I have been thinking about this as well but I don't think that it works well in our case. - The first reason is the one mentioned by Jun. We actually want to be able to limit specific clients (the misbehaving ones) in a multi-tenant environment. - The second reason is that, at least in our current implementation, the length of the queue is not really a good characteristic to estimate the load. Coming back to your example of the CreateTopicsRequest. They create path in ZK for each newly created topics which trigger a ChangeTopic event in the controller. That single event could be for a single topic in some cases or for a thousand topics in others. These two reasons aside, bounding the queue also introduces a knob which requires some tuning and thus suffers from all the points you mentioned as well, isn't it? The value may be true for one version but not for another. 4. Regarding the integration with KIP-500. The implementation of this KIP will span across the ApiLayer and the AdminManager. To be honest, we can influence the implementation to work best with what you have in mind for the future controller. If you could shed some more light on the future internal architecture, I can take this into account during the implementation. 5. Regarding KIP-590. For the create topics request, create partitions request, and delete topics request, we are lucky enough to have directed all of them to the controller already so we are fine with doing the admission in the broker which hosts the controller. If we were to throttle more operations in the future, I believe that we can continue to do it centrally while leveraging the principal that is forwarded to account for the right tenant. The reason why I would like to keep it central is that we are talking about sparse (or bursty) workloads here so distributing the quota among all the brokers makes little sense. 6. Regarding the naming of the new error code. BUSY sounds too generic to me so I would rather prefer a specific error code. The main reason being that we may be able to reuse it in the future for other quotas. That being said, we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about the naming. I wonder what the others think about this. Voilà. I hope that I have addressed all your questions/points and I am sorry for the lengthy email. Regards, David On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe wrote: > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: > > Hi, Colin, > > > > Thanks for the comment. You brought up several points. > > > > 1. Should we set up a per user quota? To me, it does seem we need some > sort > > of a quota. When the controller runs out of resources, ideally, we only > > want to penalize the bad behaving applications, instead of every > > application. To do that, we will need to know what each application is > > entitled to and the per user quota is intended to capture that. > > > > 2. How easy is it to configure a quota? The following is how an admin > > typically sets up a quota in our existing systems. Pick a generous > default > > per user quota works for most applications. For the few resource > intensive > > applications, customize a higher quota for them. Reserve enough resources > > in anticipation that a single (or a few) application will exceed the > quota > > at a given time. > > > > Hi Jun, > > Thanks for the response. > > Maybe I was too pessimistic about the ability of admins to configure a > useful quota here. I do agree that it would be nice to have the ability to > set different quotas for different users, as you mentioned. > > > > > 3. How should the quota be defined? In the discussion thread, we debated > > between a usage based model vs a rate based model. Dave and Anna argued > for > > the rate based model mostly because it's simpler to implement. > > > > I'm trying to think more about how this integrates with our plans for > KIP-500. When we get rid of ZK, we will have to handle this in the > controller itself, rather than in the AdminManager. That
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, 40. Yes, ThrottleTimeMs is set when the error code is set to QuotaViolated. This is required to let the client know how long it must wait. This is explained in the "Handling of new/old clients". Best, David On Mon, Jun 8, 2020 at 9:29 PM Jun Rao wrote: > Hi, David, > > Thanks for the updated KIP. Another minor comment below. > > 40. For the new `QUOTA_VIOLATED` error in the response to > CreateTopics/CreatePartitions/DeleteTopics, could you clarify > whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED? > > Jun > > On Mon, Jun 8, 2020 at 9:32 AM David Jacot wrote: > > > Hi Jun, > > > > 30. The rate is accumulated at the partition level. Let me clarify this > in > > the KIP. > > > > Best, > > David > > > > On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner wrote: > > > > > Hi David, > > > > > > The KIP looks good to me. I am going to the voting thread... > > > > > > Hi Jun, > > > > > > Yes, exactly. That's a separate thing from this KIP, so working on the > > fix. > > > > > > Thanks, > > > Anna > > > > > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote: > > > > > > > Hi, Anna, > > > > > > > > Thanks for the comment. For the problem that you described, perhaps > we > > > need > > > > to make the quota checking and recording more atomic? > > > > > > > > Hi, David, > > > > > > > > Thanks for the updated KIP. Looks good to me now. Just one minor > > comment > > > > below. > > > > > > > > 30. controller_mutations_rate: For topic creation and deletion, is > the > > > rate > > > > accumulated at the topic or partition level? It would be useful to > make > > > it > > > > clear in the wiki. > > > > > > > > Jun > > > > > > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot > > wrote: > > > > > > > > > Hi Anna and Jun, > > > > > > > > > > You are right. We should allocate up to the quota for each old > > sample. > > > > > > > > > > I have revamped the Throttling Algorithm section to better explain > > our > > > > > thought process and the token bucket inspiration. > > > > > > > > > > I have also added a chapter with few guidelines about how to define > > > > > the quota. There is no magic formula for this but I give few > > insights. > > > > > I don't have specific numbers that can be used out of the box so I > > > > > think that it is better to not put any for the time being. We can > > > always > > > > > complement later on in the documentation. > > > > > > > > > > Please, take a look and let me know what you think. > > > > > > > > > > Cheers, > > > > > David > > > > > > > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner > > wrote: > > > > > > > > > > > Hi David and Jun, > > > > > > > > > > > > I dug a bit deeper into the Rate implementation, and wanted to > > > confirm > > > > > that > > > > > > I do believe that the token bucket behavior is better for the > > reasons > > > > we > > > > > > already discussed but wanted to summarize. The main difference > > > between > > > > > Rate > > > > > > and token bucket is that the Rate implementation allows a burst > by > > > > > > borrowing from the future, whereas a token bucket allows a burst > by > > > > using > > > > > > accumulated tokens from the previous idle period. Using > accumulated > > > > > tokens > > > > > > smoothes out the rate measurement in general. Configuring a large > > > burst > > > > > > requires configuring a large quota window, which causes long > delays > > > for > > > > > > bursty workload, due to borrowing credits from the future. > Perhaps > > it > > > > is > > > > > > useful to add a summary in the beginning of the Throttling > > Algorithm > > > > > > section? > > > > > > > > > > > > In my previous email, I mentioned the issue we observed with the > > > > > bandwidth > > > > > > quota, where a low quota (1MB/s per broker) was limiting > bandwidth > > > > > visibly > > > > > > below the quota. I thought it was strictly the issue with the > Rate > > > > > > implementation as well, but I found a root cause to be different > > but > > > > > > amplified by the Rate implementation (long throttle delays of > > > requests > > > > > in a > > > > > > burst). I will describe it here for completeness using the > > following > > > > > > example: > > > > > > > > > > > >- > > > > > > > > > > > >Quota = 1MB/s, default window size and number of samples > > > > > >- > > > > > > > > > > > >Suppose there are 6 connections (maximum 6 outstanding > > requests), > > > > and > > > > > >each produce request is 5MB. If all requests arrive in a > burst, > > > the > > > > > > last 4 > > > > > >requests (20MB over 10MB allowed in a window) may get the same > > > > > throttle > > > > > >time if they are processed concurrently. We record the rate > > under > > > > the > > > > > > lock, > > > > > >but then calculate throttle time separately after that. So, > for > > > each > > > > > >request, the observed rate could be 3MB/s, and each request > gets > > > > > > throttle > > > > > >delay = 20 seconds
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: > Hi, Colin, > > Thanks for the comment. You brought up several points. > > 1. Should we set up a per user quota? To me, it does seem we need some sort > of a quota. When the controller runs out of resources, ideally, we only > want to penalize the bad behaving applications, instead of every > application. To do that, we will need to know what each application is > entitled to and the per user quota is intended to capture that. > > 2. How easy is it to configure a quota? The following is how an admin > typically sets up a quota in our existing systems. Pick a generous default > per user quota works for most applications. For the few resource intensive > applications, customize a higher quota for them. Reserve enough resources > in anticipation that a single (or a few) application will exceed the quota > at a given time. > Hi Jun, Thanks for the response. Maybe I was too pessimistic about the ability of admins to configure a useful quota here. I do agree that it would be nice to have the ability to set different quotas for different users, as you mentioned. > > 3. How should the quota be defined? In the discussion thread, we debated > between a usage based model vs a rate based model. Dave and Anna argued for > the rate based model mostly because it's simpler to implement. > I'm trying to think more about how this integrates with our plans for KIP-500. When we get rid of ZK, we will have to handle this in the controller itself, rather than in the AdminManager. That implies we'll have to rewrite the code. Maybe this is worth it if we want this feature now, though. Another wrinkle here is that as we discussed in KIP-590, controller operations will land on a random broker first, and only then be forwarded to the active controller. This implies that either admissions control should happen on all brokers (needing some kind of distributed quota scheme), or be done on the controller after we've already done the work of forwarding the message. The second approach might not be that bad, but it would be nice to figure this out. > > 4. If a quota is exceeded, how is that enforced? My understanding of the > KIP is that, if a quota is exceeded, the broker immediately sends back > a QUOTA_VIOLATED error and a throttle time back to the client, and the > client will wait for the throttle time before issuing the next request. > This seems to be the same as the BUSY error code you mentioned. > Yes, I agree, it sounds like we're thinking along the same lines. However, rather than QUOTA_VIOLATED, how about naming the error code BUSY? Then the error text could indicate the quota that we violated. This would be more generally useful as an error code and also avoid being confusingly similar to POLICY_VIOLATION. best, Colin > > I will let David chime in more on that. > > Thanks, > > Jun > > > > On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe wrote: > > > Hi David, > > > > Thanks for the KIP. > > > > I thought about this for a while and I actually think this approach is not > > quite right. The problem that I see here is that using an explicitly set > > quota here requires careful tuning by the cluster operator. Even worse, > > this tuning might be invalidated by changes in overall conditions or even > > more efficient controller software. > > > > For example, if we empirically find that the controller can do 1000 topics > > in a minute (or whatever), this tuning might actually be wrong if the next > > version of the software can do 2000 topics in a minute because of > > efficiency upgrades. Or, the broker that the controller is located on > > might be experiencing heavy load from its non-controller operations, and so > > it can only do 500 topics in a minute during this period. > > > > So the system administrator gets a very obscure tunable (it's not clear to > > a non-Kafka-developer what "controller mutations" are or why they should > > care). And even worse, they will have to significantly "sandbag" the value > > that they set it to, so that even under the heaviest load and oldest > > deployed version of the software, the controller can still function. Even > > worse, this new quota adds a lot of complexity to the controller. > > > > What we really want is backpressure when the controller is overloaded. I > > believe this is the alternative you discuss in "Rejected Alternatives" > > under "Throttle the Execution instead of the Admission" Your reason for > > rejecting it is that the client error handling does not work well in this > > case. But actually, this is an artifact of our current implementation, > > rather than a fundamental issue with backpressure. > > > > Consider the example of a CreateTopicsRequest. The controller could > > return a special error code if the load was too high, and take the create > > topics event off the controller queue. Let's call that error code BUSY. > > Additionally, the controller could
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, Colin, Thanks for the comment. You brought up several points. 1. Should we set up a per user quota? To me, it does seem we need some sort of a quota. When the controller runs out of resources, ideally, we only want to penalize the bad behaving applications, instead of every application. To do that, we will need to know what each application is entitled to and the per user quota is intended to capture that. 2. How easy is it to configure a quota? The following is how an admin typically sets up a quota in our existing systems. Pick a generous default per user quota works for most applications. For the few resource intensive applications, customize a higher quota for them. Reserve enough resources in anticipation that a single (or a few) application will exceed the quota at a given time. 3. How should the quota be defined? In the discussion thread, we debated between a usage based model vs a rate based model. Dave and Anna argued for the rate based model mostly because it's simpler to implement. 4. If a quota is exceeded, how is that enforced? My understanding of the KIP is that, if a quota is exceeded, the broker immediately sends back a QUOTA_VIOLATED error and a throttle time back to the client, and the client will wait for the throttle time before issuing the next request. This seems to be the same as the BUSY error code you mentioned. I will let David chime in more on that. Thanks, Jun On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe wrote: > Hi David, > > Thanks for the KIP. > > I thought about this for a while and I actually think this approach is not > quite right. The problem that I see here is that using an explicitly set > quota here requires careful tuning by the cluster operator. Even worse, > this tuning might be invalidated by changes in overall conditions or even > more efficient controller software. > > For example, if we empirically find that the controller can do 1000 topics > in a minute (or whatever), this tuning might actually be wrong if the next > version of the software can do 2000 topics in a minute because of > efficiency upgrades. Or, the broker that the controller is located on > might be experiencing heavy load from its non-controller operations, and so > it can only do 500 topics in a minute during this period. > > So the system administrator gets a very obscure tunable (it's not clear to > a non-Kafka-developer what "controller mutations" are or why they should > care). And even worse, they will have to significantly "sandbag" the value > that they set it to, so that even under the heaviest load and oldest > deployed version of the software, the controller can still function. Even > worse, this new quota adds a lot of complexity to the controller. > > What we really want is backpressure when the controller is overloaded. I > believe this is the alternative you discuss in "Rejected Alternatives" > under "Throttle the Execution instead of the Admission" Your reason for > rejecting it is that the client error handling does not work well in this > case. But actually, this is an artifact of our current implementation, > rather than a fundamental issue with backpressure. > > Consider the example of a CreateTopicsRequest. The controller could > return a special error code if the load was too high, and take the create > topics event off the controller queue. Let's call that error code BUSY. > Additionally, the controller could immediately refuse new events if the > queue had reached its maximum length, and simply return BUSY for that case > as well. > > Basically, the way we handle RPC timeouts in the controller right now is > not very good. As you know, we time out the RPC, so the client gets > TimeoutException, but then keep the event on the queue, so that it > eventually gets executed! There's no reason why we have to do that. We > could take the event off the queue if there is a timeout. This would > reduce load and mostly avoid the paradoxical situations you describe > (getting TopicExistsException for a CreateTopicsRequest retry, etc.) > > I say "mostly" because there are still cases where retries could go astray > (for example if we execute the topic creation but a network problem > prevents the response from being sent to the client). But this would still > be a very big improvement over what we have now. > > Sorry for commenting so late on this but I got distracted by some other > things. I hope we can figure this one out-- I feel like there is a chance > to significantly simplify this. > > best, > Colin > > > On Fri, May 29, 2020, at 07:57, David Jacot wrote: > > Hi folks, > > > > I'd like to start the vote for KIP-599 which proposes a new quota to > > throttle create topic, create partition, and delete topics operations to > > protect the Kafka controller: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > Please, let me know what you think. > > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the updated KIP. Another minor comment below. 40. For the new `QUOTA_VIOLATED` error in the response to CreateTopics/CreatePartitions/DeleteTopics, could you clarify whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED? Jun On Mon, Jun 8, 2020 at 9:32 AM David Jacot wrote: > Hi Jun, > > 30. The rate is accumulated at the partition level. Let me clarify this in > the KIP. > > Best, > David > > On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner wrote: > > > Hi David, > > > > The KIP looks good to me. I am going to the voting thread... > > > > Hi Jun, > > > > Yes, exactly. That's a separate thing from this KIP, so working on the > fix. > > > > Thanks, > > Anna > > > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote: > > > > > Hi, Anna, > > > > > > Thanks for the comment. For the problem that you described, perhaps we > > need > > > to make the quota checking and recording more atomic? > > > > > > Hi, David, > > > > > > Thanks for the updated KIP. Looks good to me now. Just one minor > comment > > > below. > > > > > > 30. controller_mutations_rate: For topic creation and deletion, is the > > rate > > > accumulated at the topic or partition level? It would be useful to make > > it > > > clear in the wiki. > > > > > > Jun > > > > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot > wrote: > > > > > > > Hi Anna and Jun, > > > > > > > > You are right. We should allocate up to the quota for each old > sample. > > > > > > > > I have revamped the Throttling Algorithm section to better explain > our > > > > thought process and the token bucket inspiration. > > > > > > > > I have also added a chapter with few guidelines about how to define > > > > the quota. There is no magic formula for this but I give few > insights. > > > > I don't have specific numbers that can be used out of the box so I > > > > think that it is better to not put any for the time being. We can > > always > > > > complement later on in the documentation. > > > > > > > > Please, take a look and let me know what you think. > > > > > > > > Cheers, > > > > David > > > > > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner > wrote: > > > > > > > > > Hi David and Jun, > > > > > > > > > > I dug a bit deeper into the Rate implementation, and wanted to > > confirm > > > > that > > > > > I do believe that the token bucket behavior is better for the > reasons > > > we > > > > > already discussed but wanted to summarize. The main difference > > between > > > > Rate > > > > > and token bucket is that the Rate implementation allows a burst by > > > > > borrowing from the future, whereas a token bucket allows a burst by > > > using > > > > > accumulated tokens from the previous idle period. Using accumulated > > > > tokens > > > > > smoothes out the rate measurement in general. Configuring a large > > burst > > > > > requires configuring a large quota window, which causes long delays > > for > > > > > bursty workload, due to borrowing credits from the future. Perhaps > it > > > is > > > > > useful to add a summary in the beginning of the Throttling > Algorithm > > > > > section? > > > > > > > > > > In my previous email, I mentioned the issue we observed with the > > > > bandwidth > > > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth > > > > visibly > > > > > below the quota. I thought it was strictly the issue with the Rate > > > > > implementation as well, but I found a root cause to be different > but > > > > > amplified by the Rate implementation (long throttle delays of > > requests > > > > in a > > > > > burst). I will describe it here for completeness using the > following > > > > > example: > > > > > > > > > >- > > > > > > > > > >Quota = 1MB/s, default window size and number of samples > > > > >- > > > > > > > > > >Suppose there are 6 connections (maximum 6 outstanding > requests), > > > and > > > > >each produce request is 5MB. If all requests arrive in a burst, > > the > > > > > last 4 > > > > >requests (20MB over 10MB allowed in a window) may get the same > > > > throttle > > > > >time if they are processed concurrently. We record the rate > under > > > the > > > > > lock, > > > > >but then calculate throttle time separately after that. So, for > > each > > > > >request, the observed rate could be 3MB/s, and each request gets > > > > > throttle > > > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The > > > delay > > > > is > > > > >longer than the total rate window, which results in lower > > bandwidth > > > > than > > > > >the quota. Since all requests got the same delay, they will also > > > > arrive > > > > > in > > > > >a burst, which may also result in longer delay than necessary. > It > > > > looks > > > > >pretty easy to fix, so I will open a separate JIRA for it. This > > can > > > be > > > > >additionally mitigated by token bucket behavior. > > > > > > > > > > > > > > > For the algorithm "So instead of
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, 30. The rate is accumulated at the partition level. Let me clarify this in the KIP. Best, David On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner wrote: > Hi David, > > The KIP looks good to me. I am going to the voting thread... > > Hi Jun, > > Yes, exactly. That's a separate thing from this KIP, so working on the fix. > > Thanks, > Anna > > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote: > > > Hi, Anna, > > > > Thanks for the comment. For the problem that you described, perhaps we > need > > to make the quota checking and recording more atomic? > > > > Hi, David, > > > > Thanks for the updated KIP. Looks good to me now. Just one minor comment > > below. > > > > 30. controller_mutations_rate: For topic creation and deletion, is the > rate > > accumulated at the topic or partition level? It would be useful to make > it > > clear in the wiki. > > > > Jun > > > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot wrote: > > > > > Hi Anna and Jun, > > > > > > You are right. We should allocate up to the quota for each old sample. > > > > > > I have revamped the Throttling Algorithm section to better explain our > > > thought process and the token bucket inspiration. > > > > > > I have also added a chapter with few guidelines about how to define > > > the quota. There is no magic formula for this but I give few insights. > > > I don't have specific numbers that can be used out of the box so I > > > think that it is better to not put any for the time being. We can > always > > > complement later on in the documentation. > > > > > > Please, take a look and let me know what you think. > > > > > > Cheers, > > > David > > > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner wrote: > > > > > > > Hi David and Jun, > > > > > > > > I dug a bit deeper into the Rate implementation, and wanted to > confirm > > > that > > > > I do believe that the token bucket behavior is better for the reasons > > we > > > > already discussed but wanted to summarize. The main difference > between > > > Rate > > > > and token bucket is that the Rate implementation allows a burst by > > > > borrowing from the future, whereas a token bucket allows a burst by > > using > > > > accumulated tokens from the previous idle period. Using accumulated > > > tokens > > > > smoothes out the rate measurement in general. Configuring a large > burst > > > > requires configuring a large quota window, which causes long delays > for > > > > bursty workload, due to borrowing credits from the future. Perhaps it > > is > > > > useful to add a summary in the beginning of the Throttling Algorithm > > > > section? > > > > > > > > In my previous email, I mentioned the issue we observed with the > > > bandwidth > > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth > > > visibly > > > > below the quota. I thought it was strictly the issue with the Rate > > > > implementation as well, but I found a root cause to be different but > > > > amplified by the Rate implementation (long throttle delays of > requests > > > in a > > > > burst). I will describe it here for completeness using the following > > > > example: > > > > > > > >- > > > > > > > >Quota = 1MB/s, default window size and number of samples > > > >- > > > > > > > >Suppose there are 6 connections (maximum 6 outstanding requests), > > and > > > >each produce request is 5MB. If all requests arrive in a burst, > the > > > > last 4 > > > >requests (20MB over 10MB allowed in a window) may get the same > > > throttle > > > >time if they are processed concurrently. We record the rate under > > the > > > > lock, > > > >but then calculate throttle time separately after that. So, for > each > > > >request, the observed rate could be 3MB/s, and each request gets > > > > throttle > > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The > > delay > > > is > > > >longer than the total rate window, which results in lower > bandwidth > > > than > > > >the quota. Since all requests got the same delay, they will also > > > arrive > > > > in > > > >a burst, which may also result in longer delay than necessary. It > > > looks > > > >pretty easy to fix, so I will open a separate JIRA for it. This > can > > be > > > >additionally mitigated by token bucket behavior. > > > > > > > > > > > > For the algorithm "So instead of having one sample equal to 560 in > the > > > last > > > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I > > > would > > > > allocate 5 per each old sample that is still in the overall window. > It > > > > would be a bit larger granularity than the pure token bucket (we > lose 5 > > > > units / mutation once we move past the sample window), but it is > better > > > > than the long delay. > > > > > > > > Thanks, > > > > > > > > Anna > > > > > > > > > > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao wrote: > > > > > > > > > Hi, David, Anna, > > > > > > > > > > Thanks for the discussion and the updated
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the KIP. I thought about this for a while and I actually think this approach is not quite right. The problem that I see here is that using an explicitly set quota here requires careful tuning by the cluster operator. Even worse, this tuning might be invalidated by changes in overall conditions or even more efficient controller software. For example, if we empirically find that the controller can do 1000 topics in a minute (or whatever), this tuning might actually be wrong if the next version of the software can do 2000 topics in a minute because of efficiency upgrades. Or, the broker that the controller is located on might be experiencing heavy load from its non-controller operations, and so it can only do 500 topics in a minute during this period. So the system administrator gets a very obscure tunable (it's not clear to a non-Kafka-developer what "controller mutations" are or why they should care). And even worse, they will have to significantly "sandbag" the value that they set it to, so that even under the heaviest load and oldest deployed version of the software, the controller can still function. Even worse, this new quota adds a lot of complexity to the controller. What we really want is backpressure when the controller is overloaded. I believe this is the alternative you discuss in "Rejected Alternatives" under "Throttle the Execution instead of the Admission" Your reason for rejecting it is that the client error handling does not work well in this case. But actually, this is an artifact of our current implementation, rather than a fundamental issue with backpressure. Consider the example of a CreateTopicsRequest. The controller could return a special error code if the load was too high, and take the create topics event off the controller queue. Let's call that error code BUSY. Additionally, the controller could immediately refuse new events if the queue had reached its maximum length, and simply return BUSY for that case as well. Basically, the way we handle RPC timeouts in the controller right now is not very good. As you know, we time out the RPC, so the client gets TimeoutException, but then keep the event on the queue, so that it eventually gets executed! There's no reason why we have to do that. We could take the event off the queue if there is a timeout. This would reduce load and mostly avoid the paradoxical situations you describe (getting TopicExistsException for a CreateTopicsRequest retry, etc.) I say "mostly" because there are still cases where retries could go astray (for example if we execute the topic creation but a network problem prevents the response from being sent to the client). But this would still be a very big improvement over what we have now. Sorry for commenting so late on this but I got distracted by some other things. I hope we can figure this one out-- I feel like there is a chance to significantly simplify this. best, Colin On Fri, May 29, 2020, at 07:57, David Jacot wrote: > Hi folks, > > I'd like to start the vote for KIP-599 which proposes a new quota to > throttle create topic, create partition, and delete topics operations to > protect the Kafka controller: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > Please, let me know what you think. > > Cheers, > David >
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1 (not binding) Thanks for the KIP! -Anna On Thu, Jun 4, 2020 at 8:26 AM Mickael Maison wrote: > +1 (binding) > Thanks David for looking into this important issue > > On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley wrote: > > > > +1 (non binding). > > > > Thanks! > > > > On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram > > wrote: > > > > > +1 (binding) > > > > > > Thanks for the KIP, David! > > > > > > Regards, > > > > > > Rajini > > > > > > > > > On Sun, May 31, 2020 at 3:29 AM Gwen Shapira > wrote: > > > > > > > +1 (binding) > > > > > > > > Looks great. Thank you for the in-depth design and discussion. > > > > > > > > On Fri, May 29, 2020 at 7:58 AM David Jacot > wrote: > > > > > > > > > Hi folks, > > > > > > > > > > I'd like to start the vote for KIP-599 which proposes a new quota > to > > > > > throttle create topic, create partition, and delete topics > operations > > > to > > > > > protect the Kafka controller: > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > > > > > > > Please, let me know what you think. > > > > > > > > > > Cheers, > > > > > David > > > > > > > > > > > > > > > > > -- > > > > Gwen Shapira > > > > Engineering Manager | Confluent > > > > 650.450.2760 | @gwenshap > > > > Follow us: Twitter | blog > > > > > > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, The KIP looks good to me. I am going to the voting thread... Hi Jun, Yes, exactly. That's a separate thing from this KIP, so working on the fix. Thanks, Anna On Fri, Jun 5, 2020 at 4:36 PM Jun Rao wrote: > Hi, Anna, > > Thanks for the comment. For the problem that you described, perhaps we need > to make the quota checking and recording more atomic? > > Hi, David, > > Thanks for the updated KIP. Looks good to me now. Just one minor comment > below. > > 30. controller_mutations_rate: For topic creation and deletion, is the rate > accumulated at the topic or partition level? It would be useful to make it > clear in the wiki. > > Jun > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot wrote: > > > Hi Anna and Jun, > > > > You are right. We should allocate up to the quota for each old sample. > > > > I have revamped the Throttling Algorithm section to better explain our > > thought process and the token bucket inspiration. > > > > I have also added a chapter with few guidelines about how to define > > the quota. There is no magic formula for this but I give few insights. > > I don't have specific numbers that can be used out of the box so I > > think that it is better to not put any for the time being. We can always > > complement later on in the documentation. > > > > Please, take a look and let me know what you think. > > > > Cheers, > > David > > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner wrote: > > > > > Hi David and Jun, > > > > > > I dug a bit deeper into the Rate implementation, and wanted to confirm > > that > > > I do believe that the token bucket behavior is better for the reasons > we > > > already discussed but wanted to summarize. The main difference between > > Rate > > > and token bucket is that the Rate implementation allows a burst by > > > borrowing from the future, whereas a token bucket allows a burst by > using > > > accumulated tokens from the previous idle period. Using accumulated > > tokens > > > smoothes out the rate measurement in general. Configuring a large burst > > > requires configuring a large quota window, which causes long delays for > > > bursty workload, due to borrowing credits from the future. Perhaps it > is > > > useful to add a summary in the beginning of the Throttling Algorithm > > > section? > > > > > > In my previous email, I mentioned the issue we observed with the > > bandwidth > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth > > visibly > > > below the quota. I thought it was strictly the issue with the Rate > > > implementation as well, but I found a root cause to be different but > > > amplified by the Rate implementation (long throttle delays of requests > > in a > > > burst). I will describe it here for completeness using the following > > > example: > > > > > >- > > > > > >Quota = 1MB/s, default window size and number of samples > > >- > > > > > >Suppose there are 6 connections (maximum 6 outstanding requests), > and > > >each produce request is 5MB. If all requests arrive in a burst, the > > > last 4 > > >requests (20MB over 10MB allowed in a window) may get the same > > throttle > > >time if they are processed concurrently. We record the rate under > the > > > lock, > > >but then calculate throttle time separately after that. So, for each > > >request, the observed rate could be 3MB/s, and each request gets > > > throttle > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The > delay > > is > > >longer than the total rate window, which results in lower bandwidth > > than > > >the quota. Since all requests got the same delay, they will also > > arrive > > > in > > >a burst, which may also result in longer delay than necessary. It > > looks > > >pretty easy to fix, so I will open a separate JIRA for it. This can > be > > >additionally mitigated by token bucket behavior. > > > > > > > > > For the algorithm "So instead of having one sample equal to 560 in the > > last > > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I > > would > > > allocate 5 per each old sample that is still in the overall window. It > > > would be a bit larger granularity than the pure token bucket (we lose 5 > > > units / mutation once we move past the sample window), but it is better > > > than the long delay. > > > > > > Thanks, > > > > > > Anna > > > > > > > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao wrote: > > > > > > > Hi, David, Anna, > > > > > > > > Thanks for the discussion and the updated wiki. > > > > > > > > 11. If we believe the token bucket behavior is better in terms of > > > handling > > > > the burst behavior, we probably don't need a separate KIP since it's > > just > > > > an implementation detail. > > > > > > > > Regarding "So instead of having one sample equal to 560 in the last > > > window, > > > > we will have 100 samples equal to 5.6.", I was thinking that we will > > > > allocate 5 to each of the first 99 samples and 65
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, Anna, Thanks for the comment. For the problem that you described, perhaps we need to make the quota checking and recording more atomic? Hi, David, Thanks for the updated KIP. Looks good to me now. Just one minor comment below. 30. controller_mutations_rate: For topic creation and deletion, is the rate accumulated at the topic or partition level? It would be useful to make it clear in the wiki. Jun On Fri, Jun 5, 2020 at 7:23 AM David Jacot wrote: > Hi Anna and Jun, > > You are right. We should allocate up to the quota for each old sample. > > I have revamped the Throttling Algorithm section to better explain our > thought process and the token bucket inspiration. > > I have also added a chapter with few guidelines about how to define > the quota. There is no magic formula for this but I give few insights. > I don't have specific numbers that can be used out of the box so I > think that it is better to not put any for the time being. We can always > complement later on in the documentation. > > Please, take a look and let me know what you think. > > Cheers, > David > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner wrote: > > > Hi David and Jun, > > > > I dug a bit deeper into the Rate implementation, and wanted to confirm > that > > I do believe that the token bucket behavior is better for the reasons we > > already discussed but wanted to summarize. The main difference between > Rate > > and token bucket is that the Rate implementation allows a burst by > > borrowing from the future, whereas a token bucket allows a burst by using > > accumulated tokens from the previous idle period. Using accumulated > tokens > > smoothes out the rate measurement in general. Configuring a large burst > > requires configuring a large quota window, which causes long delays for > > bursty workload, due to borrowing credits from the future. Perhaps it is > > useful to add a summary in the beginning of the Throttling Algorithm > > section? > > > > In my previous email, I mentioned the issue we observed with the > bandwidth > > quota, where a low quota (1MB/s per broker) was limiting bandwidth > visibly > > below the quota. I thought it was strictly the issue with the Rate > > implementation as well, but I found a root cause to be different but > > amplified by the Rate implementation (long throttle delays of requests > in a > > burst). I will describe it here for completeness using the following > > example: > > > >- > > > >Quota = 1MB/s, default window size and number of samples > >- > > > >Suppose there are 6 connections (maximum 6 outstanding requests), and > >each produce request is 5MB. If all requests arrive in a burst, the > > last 4 > >requests (20MB over 10MB allowed in a window) may get the same > throttle > >time if they are processed concurrently. We record the rate under the > > lock, > >but then calculate throttle time separately after that. So, for each > >request, the observed rate could be 3MB/s, and each request gets > > throttle > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay > is > >longer than the total rate window, which results in lower bandwidth > than > >the quota. Since all requests got the same delay, they will also > arrive > > in > >a burst, which may also result in longer delay than necessary. It > looks > >pretty easy to fix, so I will open a separate JIRA for it. This can be > >additionally mitigated by token bucket behavior. > > > > > > For the algorithm "So instead of having one sample equal to 560 in the > last > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I > would > > allocate 5 per each old sample that is still in the overall window. It > > would be a bit larger granularity than the pure token bucket (we lose 5 > > units / mutation once we move past the sample window), but it is better > > than the long delay. > > > > Thanks, > > > > Anna > > > > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao wrote: > > > > > Hi, David, Anna, > > > > > > Thanks for the discussion and the updated wiki. > > > > > > 11. If we believe the token bucket behavior is better in terms of > > handling > > > the burst behavior, we probably don't need a separate KIP since it's > just > > > an implementation detail. > > > > > > Regarding "So instead of having one sample equal to 560 in the last > > window, > > > we will have 100 samples equal to 5.6.", I was thinking that we will > > > allocate 5 to each of the first 99 samples and 65 to the last sample. > > Then, > > > 6 new samples have to come before the balance becomes 0 again. > > Intuitively, > > > we are accumulating credits in each sample. If a usage comes in, we > first > > > use all existing credits to offset that. If we can't, the remaining > usage > > > will be recorded in the last sample, which will be offset by future > > > credits. That seems to match the token bucket behavior the closest. > > > > > > 20. Could you provide some guidelines
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Anna and Jun, You are right. We should allocate up to the quota for each old sample. I have revamped the Throttling Algorithm section to better explain our thought process and the token bucket inspiration. I have also added a chapter with few guidelines about how to define the quota. There is no magic formula for this but I give few insights. I don't have specific numbers that can be used out of the box so I think that it is better to not put any for the time being. We can always complement later on in the documentation. Please, take a look and let me know what you think. Cheers, David On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner wrote: > Hi David and Jun, > > I dug a bit deeper into the Rate implementation, and wanted to confirm that > I do believe that the token bucket behavior is better for the reasons we > already discussed but wanted to summarize. The main difference between Rate > and token bucket is that the Rate implementation allows a burst by > borrowing from the future, whereas a token bucket allows a burst by using > accumulated tokens from the previous idle period. Using accumulated tokens > smoothes out the rate measurement in general. Configuring a large burst > requires configuring a large quota window, which causes long delays for > bursty workload, due to borrowing credits from the future. Perhaps it is > useful to add a summary in the beginning of the Throttling Algorithm > section? > > In my previous email, I mentioned the issue we observed with the bandwidth > quota, where a low quota (1MB/s per broker) was limiting bandwidth visibly > below the quota. I thought it was strictly the issue with the Rate > implementation as well, but I found a root cause to be different but > amplified by the Rate implementation (long throttle delays of requests in a > burst). I will describe it here for completeness using the following > example: > >- > >Quota = 1MB/s, default window size and number of samples >- > >Suppose there are 6 connections (maximum 6 outstanding requests), and >each produce request is 5MB. If all requests arrive in a burst, the > last 4 >requests (20MB over 10MB allowed in a window) may get the same throttle >time if they are processed concurrently. We record the rate under the > lock, >but then calculate throttle time separately after that. So, for each >request, the observed rate could be 3MB/s, and each request gets > throttle >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay is >longer than the total rate window, which results in lower bandwidth than >the quota. Since all requests got the same delay, they will also arrive > in >a burst, which may also result in longer delay than necessary. It looks >pretty easy to fix, so I will open a separate JIRA for it. This can be >additionally mitigated by token bucket behavior. > > > For the algorithm "So instead of having one sample equal to 560 in the last > window, we will have 100 samples equal to 5.6.", I agree with Jun. I would > allocate 5 per each old sample that is still in the overall window. It > would be a bit larger granularity than the pure token bucket (we lose 5 > units / mutation once we move past the sample window), but it is better > than the long delay. > > Thanks, > > Anna > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao wrote: > > > Hi, David, Anna, > > > > Thanks for the discussion and the updated wiki. > > > > 11. If we believe the token bucket behavior is better in terms of > handling > > the burst behavior, we probably don't need a separate KIP since it's just > > an implementation detail. > > > > Regarding "So instead of having one sample equal to 560 in the last > window, > > we will have 100 samples equal to 5.6.", I was thinking that we will > > allocate 5 to each of the first 99 samples and 65 to the last sample. > Then, > > 6 new samples have to come before the balance becomes 0 again. > Intuitively, > > we are accumulating credits in each sample. If a usage comes in, we first > > use all existing credits to offset that. If we can't, the remaining usage > > will be recorded in the last sample, which will be offset by future > > credits. That seems to match the token bucket behavior the closest. > > > > 20. Could you provide some guidelines on the typical rate that an admin > > should set? > > > > Jun > > > > On Thu, Jun 4, 2020 at 8:22 AM David Jacot wrote: > > > > > Hi all, > > > > > > I just published an updated version of the KIP which includes: > > > * Using a slightly modified version of our Rate. I have tried to > > formalize > > > it based on our discussion. As Anna suggested, we may find a better way > > to > > > implement it. > > > * Handling of ValidateOnly as pointed out by Tom. > > > > > > Please, check it out and let me know what you think. > > > > > > Best, > > > David > > > > > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley > wrote: > > > > > > > Hi David, > > > > > > > > As a user I might expect the
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David and Jun, I dug a bit deeper into the Rate implementation, and wanted to confirm that I do believe that the token bucket behavior is better for the reasons we already discussed but wanted to summarize. The main difference between Rate and token bucket is that the Rate implementation allows a burst by borrowing from the future, whereas a token bucket allows a burst by using accumulated tokens from the previous idle period. Using accumulated tokens smoothes out the rate measurement in general. Configuring a large burst requires configuring a large quota window, which causes long delays for bursty workload, due to borrowing credits from the future. Perhaps it is useful to add a summary in the beginning of the Throttling Algorithm section? In my previous email, I mentioned the issue we observed with the bandwidth quota, where a low quota (1MB/s per broker) was limiting bandwidth visibly below the quota. I thought it was strictly the issue with the Rate implementation as well, but I found a root cause to be different but amplified by the Rate implementation (long throttle delays of requests in a burst). I will describe it here for completeness using the following example: - Quota = 1MB/s, default window size and number of samples - Suppose there are 6 connections (maximum 6 outstanding requests), and each produce request is 5MB. If all requests arrive in a burst, the last 4 requests (20MB over 10MB allowed in a window) may get the same throttle time if they are processed concurrently. We record the rate under the lock, but then calculate throttle time separately after that. So, for each request, the observed rate could be 3MB/s, and each request gets throttle delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The delay is longer than the total rate window, which results in lower bandwidth than the quota. Since all requests got the same delay, they will also arrive in a burst, which may also result in longer delay than necessary. It looks pretty easy to fix, so I will open a separate JIRA for it. This can be additionally mitigated by token bucket behavior. For the algorithm "So instead of having one sample equal to 560 in the last window, we will have 100 samples equal to 5.6.", I agree with Jun. I would allocate 5 per each old sample that is still in the overall window. It would be a bit larger granularity than the pure token bucket (we lose 5 units / mutation once we move past the sample window), but it is better than the long delay. Thanks, Anna On Thu, Jun 4, 2020 at 6:33 PM Jun Rao wrote: > Hi, David, Anna, > > Thanks for the discussion and the updated wiki. > > 11. If we believe the token bucket behavior is better in terms of handling > the burst behavior, we probably don't need a separate KIP since it's just > an implementation detail. > > Regarding "So instead of having one sample equal to 560 in the last window, > we will have 100 samples equal to 5.6.", I was thinking that we will > allocate 5 to each of the first 99 samples and 65 to the last sample. Then, > 6 new samples have to come before the balance becomes 0 again. Intuitively, > we are accumulating credits in each sample. If a usage comes in, we first > use all existing credits to offset that. If we can't, the remaining usage > will be recorded in the last sample, which will be offset by future > credits. That seems to match the token bucket behavior the closest. > > 20. Could you provide some guidelines on the typical rate that an admin > should set? > > Jun > > On Thu, Jun 4, 2020 at 8:22 AM David Jacot wrote: > > > Hi all, > > > > I just published an updated version of the KIP which includes: > > * Using a slightly modified version of our Rate. I have tried to > formalize > > it based on our discussion. As Anna suggested, we may find a better way > to > > implement it. > > * Handling of ValidateOnly as pointed out by Tom. > > > > Please, check it out and let me know what you think. > > > > Best, > > David > > > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley wrote: > > > > > Hi David, > > > > > > As a user I might expect the validateOnly option to do everything > except > > > actually make the changes. That interpretation would imply the quota > > should > > > be checked, but the check should obviously be side-effect free. I think > > > this interpretation could be useful because it gives the caller either > > some > > > confidence that they're not going to hit the quota, or tell them, via > the > > > exception, when they can expect the call to work. But for this to be > > useful > > > it would require the retry logic to not retry the request when > > validateOnly > > > was set. > > > > > > On the other hand, if validateOnly is really about validating only some > > > aspects of the request (which maybe is what the name implies), then we > > > should clarify in the Javadoc that the quota is not included in the > > > validation. > > > > > > On balance, I agree with what
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Anna, Thanks for the discussion and the updated wiki. 11. If we believe the token bucket behavior is better in terms of handling the burst behavior, we probably don't need a separate KIP since it's just an implementation detail. Regarding "So instead of having one sample equal to 560 in the last window, we will have 100 samples equal to 5.6.", I was thinking that we will allocate 5 to each of the first 99 samples and 65 to the last sample. Then, 6 new samples have to come before the balance becomes 0 again. Intuitively, we are accumulating credits in each sample. If a usage comes in, we first use all existing credits to offset that. If we can't, the remaining usage will be recorded in the last sample, which will be offset by future credits. That seems to match the token bucket behavior the closest. 20. Could you provide some guidelines on the typical rate that an admin should set? Jun On Thu, Jun 4, 2020 at 8:22 AM David Jacot wrote: > Hi all, > > I just published an updated version of the KIP which includes: > * Using a slightly modified version of our Rate. I have tried to formalize > it based on our discussion. As Anna suggested, we may find a better way to > implement it. > * Handling of ValidateOnly as pointed out by Tom. > > Please, check it out and let me know what you think. > > Best, > David > > On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley wrote: > > > Hi David, > > > > As a user I might expect the validateOnly option to do everything except > > actually make the changes. That interpretation would imply the quota > should > > be checked, but the check should obviously be side-effect free. I think > > this interpretation could be useful because it gives the caller either > some > > confidence that they're not going to hit the quota, or tell them, via the > > exception, when they can expect the call to work. But for this to be > useful > > it would require the retry logic to not retry the request when > validateOnly > > was set. > > > > On the other hand, if validateOnly is really about validating only some > > aspects of the request (which maybe is what the name implies), then we > > should clarify in the Javadoc that the quota is not included in the > > validation. > > > > On balance, I agree with what you're proposing, since the extra utility > of > > including the quota in the validation seems to be not worth the extra > > complication for the retry. > > > > Thanks, > > > > Tom > > > > > > > > On Thu, Jun 4, 2020 at 3:32 PM David Jacot wrote: > > > > > Hi Tom, > > > > > > That's a good question. As the validation does not create any load on > the > > > controller, I was planning to do it without checking the quota at all. > > Does > > > that > > > sound reasonable? > > > > > > Best, > > > David > > > > > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot > wrote: > > > > > > > Hi Jun and Anna, > > > > > > > > Thank you both for your replies. > > > > > > > > Based on our recent discussion, I agree that using a rate is better > to > > > > remain > > > > consistent with other quotas. As you both suggested, it seems that > > > changing > > > > the way we compute the rate to better handle spiky workloads and > > behave a > > > > bit more similarly to the token bucket algorithm makes sense for all > > > > quotas as > > > > well. > > > > > > > > I will update the KIP to reflect this. > > > > > > > > Anna, I think that we can explain this in this KIP. We can't just say > > > that > > > > the Rate > > > > will be updated in this KIP. I think that we need to give a bit more > > > info. > > > > > > > > Best, > > > > David > > > > > > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner > wrote: > > > > > > > >> Hi Jun and David, > > > >> > > > >> Regarding token bucket vs, Rate behavior. We recently observed a > > couple > > > of > > > >> cases where a bursty workload behavior would result in long-ish > pauses > > > in > > > >> between, resulting in lower overall bandwidth than the quota. I will > > > need > > > >> to debug this a bit more to be 100% sure, but it does look like the > > case > > > >> described by David earlier in this thread. So, I agree with Jun -- I > > > think > > > >> we should make all quota rate behavior consistent, and probably > > similar > > > to > > > >> the token bucket one. > > > >> > > > >> Looking at KIP-13, it doesn't describe rate calculation in enough > > > detail, > > > >> but does mention window size. So, we could keep "window size" and > > > "number > > > >> of samples" configs and change Rate implementation to be more > similar > > to > > > >> token bucket: > > > >> * number of samples define our burst size > > > >> * Change the behavior, which could be described as: If a burst > happens > > > >> after an idle period, the burst would effectively spread evenly over > > the > > > >> (now - window) time period, where window is ( - > 1)* > > > >> . Which basically describes a token bucket, while > keeping > > > the > > > >> current quota configs. I think we can
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1 (binding) Thanks David for looking into this important issue On Thu, Jun 4, 2020 at 3:59 PM Tom Bentley wrote: > > +1 (non binding). > > Thanks! > > On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram > wrote: > > > +1 (binding) > > > > Thanks for the KIP, David! > > > > Regards, > > > > Rajini > > > > > > On Sun, May 31, 2020 at 3:29 AM Gwen Shapira wrote: > > > > > +1 (binding) > > > > > > Looks great. Thank you for the in-depth design and discussion. > > > > > > On Fri, May 29, 2020 at 7:58 AM David Jacot wrote: > > > > > > > Hi folks, > > > > > > > > I'd like to start the vote for KIP-599 which proposes a new quota to > > > > throttle create topic, create partition, and delete topics operations > > to > > > > protect the Kafka controller: > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > > > > > Please, let me know what you think. > > > > > > > > Cheers, > > > > David > > > > > > > > > > > > > -- > > > Gwen Shapira > > > Engineering Manager | Confluent > > > 650.450.2760 | @gwenshap > > > Follow us: Twitter | blog > > > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi all, I just published an updated version of the KIP which includes: * Using a slightly modified version of our Rate. I have tried to formalize it based on our discussion. As Anna suggested, we may find a better way to implement it. * Handling of ValidateOnly as pointed out by Tom. Please, check it out and let me know what you think. Best, David On Thu, Jun 4, 2020 at 4:57 PM Tom Bentley wrote: > Hi David, > > As a user I might expect the validateOnly option to do everything except > actually make the changes. That interpretation would imply the quota should > be checked, but the check should obviously be side-effect free. I think > this interpretation could be useful because it gives the caller either some > confidence that they're not going to hit the quota, or tell them, via the > exception, when they can expect the call to work. But for this to be useful > it would require the retry logic to not retry the request when validateOnly > was set. > > On the other hand, if validateOnly is really about validating only some > aspects of the request (which maybe is what the name implies), then we > should clarify in the Javadoc that the quota is not included in the > validation. > > On balance, I agree with what you're proposing, since the extra utility of > including the quota in the validation seems to be not worth the extra > complication for the retry. > > Thanks, > > Tom > > > > On Thu, Jun 4, 2020 at 3:32 PM David Jacot wrote: > > > Hi Tom, > > > > That's a good question. As the validation does not create any load on the > > controller, I was planning to do it without checking the quota at all. > Does > > that > > sound reasonable? > > > > Best, > > David > > > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot wrote: > > > > > Hi Jun and Anna, > > > > > > Thank you both for your replies. > > > > > > Based on our recent discussion, I agree that using a rate is better to > > > remain > > > consistent with other quotas. As you both suggested, it seems that > > changing > > > the way we compute the rate to better handle spiky workloads and > behave a > > > bit more similarly to the token bucket algorithm makes sense for all > > > quotas as > > > well. > > > > > > I will update the KIP to reflect this. > > > > > > Anna, I think that we can explain this in this KIP. We can't just say > > that > > > the Rate > > > will be updated in this KIP. I think that we need to give a bit more > > info. > > > > > > Best, > > > David > > > > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner wrote: > > > > > >> Hi Jun and David, > > >> > > >> Regarding token bucket vs, Rate behavior. We recently observed a > couple > > of > > >> cases where a bursty workload behavior would result in long-ish pauses > > in > > >> between, resulting in lower overall bandwidth than the quota. I will > > need > > >> to debug this a bit more to be 100% sure, but it does look like the > case > > >> described by David earlier in this thread. So, I agree with Jun -- I > > think > > >> we should make all quota rate behavior consistent, and probably > similar > > to > > >> the token bucket one. > > >> > > >> Looking at KIP-13, it doesn't describe rate calculation in enough > > detail, > > >> but does mention window size. So, we could keep "window size" and > > "number > > >> of samples" configs and change Rate implementation to be more similar > to > > >> token bucket: > > >> * number of samples define our burst size > > >> * Change the behavior, which could be described as: If a burst happens > > >> after an idle period, the burst would effectively spread evenly over > the > > >> (now - window) time period, where window is ( - 1)* > > >> . Which basically describes a token bucket, while keeping > > the > > >> current quota configs. I think we can even implement this by changing > > the > > >> way we record the last sample or lastWindowMs. > > >> > > >> Jun, if we would be changing Rate calculation behavior in bandwidth > and > > >> request quotas, would we need a separate KIP? Shouldn't need to if we > > >> keep window size and number of samples configs, right? > > >> > > >> Thanks, > > >> Anna > > >> > > >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao wrote: > > >> > > >> > Hi, David, > > >> > > > >> > Thanks for the reply. > > >> > > > >> > 11. To match the behavior in the Token bucket approach, I was > thinking > > >> that > > >> > requests that don't fit in the previous time windows will be > > >> accumulated in > > >> > the current time window. So, the 60 extra requests will be > accumulated > > >> in > > >> > the latest window. Then, the client also has to wait for 12 more > secs > > >> > before throttling is removed. I agree that this is probably a better > > >> > behavior and it's reasonable to change the existing behavior to this > > >> one. > > >> > > > >> > To me, it seems that sample_size * num_windows is the same as max > > burst > > >> > balance. The latter seems a bit better to configure. The thing is > that > > >> the > > >> > existing
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1 (non binding). Thanks! On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram wrote: > +1 (binding) > > Thanks for the KIP, David! > > Regards, > > Rajini > > > On Sun, May 31, 2020 at 3:29 AM Gwen Shapira wrote: > > > +1 (binding) > > > > Looks great. Thank you for the in-depth design and discussion. > > > > On Fri, May 29, 2020 at 7:58 AM David Jacot wrote: > > > > > Hi folks, > > > > > > I'd like to start the vote for KIP-599 which proposes a new quota to > > > throttle create topic, create partition, and delete topics operations > to > > > protect the Kafka controller: > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > > > Please, let me know what you think. > > > > > > Cheers, > > > David > > > > > > > > > -- > > Gwen Shapira > > Engineering Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter | blog > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, As a user I might expect the validateOnly option to do everything except actually make the changes. That interpretation would imply the quota should be checked, but the check should obviously be side-effect free. I think this interpretation could be useful because it gives the caller either some confidence that they're not going to hit the quota, or tell them, via the exception, when they can expect the call to work. But for this to be useful it would require the retry logic to not retry the request when validateOnly was set. On the other hand, if validateOnly is really about validating only some aspects of the request (which maybe is what the name implies), then we should clarify in the Javadoc that the quota is not included in the validation. On balance, I agree with what you're proposing, since the extra utility of including the quota in the validation seems to be not worth the extra complication for the retry. Thanks, Tom On Thu, Jun 4, 2020 at 3:32 PM David Jacot wrote: > Hi Tom, > > That's a good question. As the validation does not create any load on the > controller, I was planning to do it without checking the quota at all. Does > that > sound reasonable? > > Best, > David > > On Thu, Jun 4, 2020 at 4:23 PM David Jacot wrote: > > > Hi Jun and Anna, > > > > Thank you both for your replies. > > > > Based on our recent discussion, I agree that using a rate is better to > > remain > > consistent with other quotas. As you both suggested, it seems that > changing > > the way we compute the rate to better handle spiky workloads and behave a > > bit more similarly to the token bucket algorithm makes sense for all > > quotas as > > well. > > > > I will update the KIP to reflect this. > > > > Anna, I think that we can explain this in this KIP. We can't just say > that > > the Rate > > will be updated in this KIP. I think that we need to give a bit more > info. > > > > Best, > > David > > > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner wrote: > > > >> Hi Jun and David, > >> > >> Regarding token bucket vs, Rate behavior. We recently observed a couple > of > >> cases where a bursty workload behavior would result in long-ish pauses > in > >> between, resulting in lower overall bandwidth than the quota. I will > need > >> to debug this a bit more to be 100% sure, but it does look like the case > >> described by David earlier in this thread. So, I agree with Jun -- I > think > >> we should make all quota rate behavior consistent, and probably similar > to > >> the token bucket one. > >> > >> Looking at KIP-13, it doesn't describe rate calculation in enough > detail, > >> but does mention window size. So, we could keep "window size" and > "number > >> of samples" configs and change Rate implementation to be more similar to > >> token bucket: > >> * number of samples define our burst size > >> * Change the behavior, which could be described as: If a burst happens > >> after an idle period, the burst would effectively spread evenly over the > >> (now - window) time period, where window is ( - 1)* > >> . Which basically describes a token bucket, while keeping > the > >> current quota configs. I think we can even implement this by changing > the > >> way we record the last sample or lastWindowMs. > >> > >> Jun, if we would be changing Rate calculation behavior in bandwidth and > >> request quotas, would we need a separate KIP? Shouldn't need to if we > >> keep window size and number of samples configs, right? > >> > >> Thanks, > >> Anna > >> > >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao wrote: > >> > >> > Hi, David, > >> > > >> > Thanks for the reply. > >> > > >> > 11. To match the behavior in the Token bucket approach, I was thinking > >> that > >> > requests that don't fit in the previous time windows will be > >> accumulated in > >> > the current time window. So, the 60 extra requests will be accumulated > >> in > >> > the latest window. Then, the client also has to wait for 12 more secs > >> > before throttling is removed. I agree that this is probably a better > >> > behavior and it's reasonable to change the existing behavior to this > >> one. > >> > > >> > To me, it seems that sample_size * num_windows is the same as max > burst > >> > balance. The latter seems a bit better to configure. The thing is that > >> the > >> > existing quota system has already been used in quite a few places and > >> if we > >> > want to change the configuration in the future, there is the migration > >> > cost. Given that, do you feel it's better to adopt the new token > bucket > >> > terminology or just adopt the behavior somehow into our existing > >> system? If > >> > it's the former, it would be useful to document this in the rejected > >> > section and add a future plan on migrating existing quota > >> configurations. > >> > > >> > Jun > >> > > >> > > >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot > wrote: > >> > > >> > > Hi Jun, > >> > > > >> > > Thanks for your reply. > >> > > > >> > > 10. I think
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Tom, That's a good question. As the validation does not create any load on the controller, I was planning to do it without checking the quota at all. Does that sound reasonable? Best, David On Thu, Jun 4, 2020 at 4:23 PM David Jacot wrote: > Hi Jun and Anna, > > Thank you both for your replies. > > Based on our recent discussion, I agree that using a rate is better to > remain > consistent with other quotas. As you both suggested, it seems that changing > the way we compute the rate to better handle spiky workloads and behave a > bit more similarly to the token bucket algorithm makes sense for all > quotas as > well. > > I will update the KIP to reflect this. > > Anna, I think that we can explain this in this KIP. We can't just say that > the Rate > will be updated in this KIP. I think that we need to give a bit more info. > > Best, > David > > On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner wrote: > >> Hi Jun and David, >> >> Regarding token bucket vs, Rate behavior. We recently observed a couple of >> cases where a bursty workload behavior would result in long-ish pauses in >> between, resulting in lower overall bandwidth than the quota. I will need >> to debug this a bit more to be 100% sure, but it does look like the case >> described by David earlier in this thread. So, I agree with Jun -- I think >> we should make all quota rate behavior consistent, and probably similar to >> the token bucket one. >> >> Looking at KIP-13, it doesn't describe rate calculation in enough detail, >> but does mention window size. So, we could keep "window size" and "number >> of samples" configs and change Rate implementation to be more similar to >> token bucket: >> * number of samples define our burst size >> * Change the behavior, which could be described as: If a burst happens >> after an idle period, the burst would effectively spread evenly over the >> (now - window) time period, where window is ( - 1)* >> . Which basically describes a token bucket, while keeping the >> current quota configs. I think we can even implement this by changing the >> way we record the last sample or lastWindowMs. >> >> Jun, if we would be changing Rate calculation behavior in bandwidth and >> request quotas, would we need a separate KIP? Shouldn't need to if we >> keep window size and number of samples configs, right? >> >> Thanks, >> Anna >> >> On Wed, Jun 3, 2020 at 3:24 PM Jun Rao wrote: >> >> > Hi, David, >> > >> > Thanks for the reply. >> > >> > 11. To match the behavior in the Token bucket approach, I was thinking >> that >> > requests that don't fit in the previous time windows will be >> accumulated in >> > the current time window. So, the 60 extra requests will be accumulated >> in >> > the latest window. Then, the client also has to wait for 12 more secs >> > before throttling is removed. I agree that this is probably a better >> > behavior and it's reasonable to change the existing behavior to this >> one. >> > >> > To me, it seems that sample_size * num_windows is the same as max burst >> > balance. The latter seems a bit better to configure. The thing is that >> the >> > existing quota system has already been used in quite a few places and >> if we >> > want to change the configuration in the future, there is the migration >> > cost. Given that, do you feel it's better to adopt the new token bucket >> > terminology or just adopt the behavior somehow into our existing >> system? If >> > it's the former, it would be useful to document this in the rejected >> > section and add a future plan on migrating existing quota >> configurations. >> > >> > Jun >> > >> > >> > On Tue, Jun 2, 2020 at 6:55 AM David Jacot wrote: >> > >> > > Hi Jun, >> > > >> > > Thanks for your reply. >> > > >> > > 10. I think that both options are likely equivalent from an accuracy >> > point >> > > of >> > > view. If we put the implementation aside, conceptually, I am not >> > convinced >> > > by the used based approach because resources don't have a clear owner >> > > in AK at the moment. A topic can be created by (Principal A, no client >> > id), >> > > then can be extended by (no principal, Client B), and finally deleted >> by >> > > (Principal C, Client C). This does not sound right to me and I fear >> that >> > it >> > > is not going to be easy to grasp for our users. >> > > >> > > Regarding the naming, I do agree that we can make it more future >> proof. >> > > I propose `controller_mutations_rate`. I think that differentiating >> the >> > > mutations >> > > from the reads is still a good thing for the future. >> > > >> > > 11. I am not convinced by your proposal for the following reasons: >> > > >> > > First, in my toy example, I used 101 windows and 7 * 80 requests. We >> > could >> > > effectively allocate 5 * 100 requests to the previous windows assuming >> > that >> > > they are empty. What shall we do with the remaining 60 requests? >> Shall we >> > > allocate them to the current window or shall we divide it among all >> the >> > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun and Anna, Thank you both for your replies. Based on our recent discussion, I agree that using a rate is better to remain consistent with other quotas. As you both suggested, it seems that changing the way we compute the rate to better handle spiky workloads and behave a bit more similarly to the token bucket algorithm makes sense for all quotas as well. I will update the KIP to reflect this. Anna, I think that we can explain this in this KIP. We can't just say that the Rate will be updated in this KIP. I think that we need to give a bit more info. Best, David On Thu, Jun 4, 2020 at 6:31 AM Anna Povzner wrote: > Hi Jun and David, > > Regarding token bucket vs, Rate behavior. We recently observed a couple of > cases where a bursty workload behavior would result in long-ish pauses in > between, resulting in lower overall bandwidth than the quota. I will need > to debug this a bit more to be 100% sure, but it does look like the case > described by David earlier in this thread. So, I agree with Jun -- I think > we should make all quota rate behavior consistent, and probably similar to > the token bucket one. > > Looking at KIP-13, it doesn't describe rate calculation in enough detail, > but does mention window size. So, we could keep "window size" and "number > of samples" configs and change Rate implementation to be more similar to > token bucket: > * number of samples define our burst size > * Change the behavior, which could be described as: If a burst happens > after an idle period, the burst would effectively spread evenly over the > (now - window) time period, where window is ( - 1)* > . Which basically describes a token bucket, while keeping the > current quota configs. I think we can even implement this by changing the > way we record the last sample or lastWindowMs. > > Jun, if we would be changing Rate calculation behavior in bandwidth and > request quotas, would we need a separate KIP? Shouldn't need to if we > keep window size and number of samples configs, right? > > Thanks, > Anna > > On Wed, Jun 3, 2020 at 3:24 PM Jun Rao wrote: > > > Hi, David, > > > > Thanks for the reply. > > > > 11. To match the behavior in the Token bucket approach, I was thinking > that > > requests that don't fit in the previous time windows will be accumulated > in > > the current time window. So, the 60 extra requests will be accumulated in > > the latest window. Then, the client also has to wait for 12 more secs > > before throttling is removed. I agree that this is probably a better > > behavior and it's reasonable to change the existing behavior to this one. > > > > To me, it seems that sample_size * num_windows is the same as max burst > > balance. The latter seems a bit better to configure. The thing is that > the > > existing quota system has already been used in quite a few places and if > we > > want to change the configuration in the future, there is the migration > > cost. Given that, do you feel it's better to adopt the new token bucket > > terminology or just adopt the behavior somehow into our existing system? > If > > it's the former, it would be useful to document this in the rejected > > section and add a future plan on migrating existing quota configurations. > > > > Jun > > > > > > On Tue, Jun 2, 2020 at 6:55 AM David Jacot wrote: > > > > > Hi Jun, > > > > > > Thanks for your reply. > > > > > > 10. I think that both options are likely equivalent from an accuracy > > point > > > of > > > view. If we put the implementation aside, conceptually, I am not > > convinced > > > by the used based approach because resources don't have a clear owner > > > in AK at the moment. A topic can be created by (Principal A, no client > > id), > > > then can be extended by (no principal, Client B), and finally deleted > by > > > (Principal C, Client C). This does not sound right to me and I fear > that > > it > > > is not going to be easy to grasp for our users. > > > > > > Regarding the naming, I do agree that we can make it more future proof. > > > I propose `controller_mutations_rate`. I think that differentiating the > > > mutations > > > from the reads is still a good thing for the future. > > > > > > 11. I am not convinced by your proposal for the following reasons: > > > > > > First, in my toy example, I used 101 windows and 7 * 80 requests. We > > could > > > effectively allocate 5 * 100 requests to the previous windows assuming > > that > > > they are empty. What shall we do with the remaining 60 requests? Shall > we > > > allocate them to the current window or shall we divide it among all the > > > windows? > > > > > > Second, I don't think that we can safely change the behavior of all the > > > existing > > > rates used because it actually changes the computation of the rate as > > > values > > > allocated to past windows would expire before they would today. > > > > > > Overall, while trying to fit in the current rate, we are going to > build a > > > slightly > > > different version of the
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun and David, Regarding token bucket vs, Rate behavior. We recently observed a couple of cases where a bursty workload behavior would result in long-ish pauses in between, resulting in lower overall bandwidth than the quota. I will need to debug this a bit more to be 100% sure, but it does look like the case described by David earlier in this thread. So, I agree with Jun -- I think we should make all quota rate behavior consistent, and probably similar to the token bucket one. Looking at KIP-13, it doesn't describe rate calculation in enough detail, but does mention window size. So, we could keep "window size" and "number of samples" configs and change Rate implementation to be more similar to token bucket: * number of samples define our burst size * Change the behavior, which could be described as: If a burst happens after an idle period, the burst would effectively spread evenly over the (now - window) time period, where window is ( - 1)* . Which basically describes a token bucket, while keeping the current quota configs. I think we can even implement this by changing the way we record the last sample or lastWindowMs. Jun, if we would be changing Rate calculation behavior in bandwidth and request quotas, would we need a separate KIP? Shouldn't need to if we keep window size and number of samples configs, right? Thanks, Anna On Wed, Jun 3, 2020 at 3:24 PM Jun Rao wrote: > Hi, David, > > Thanks for the reply. > > 11. To match the behavior in the Token bucket approach, I was thinking that > requests that don't fit in the previous time windows will be accumulated in > the current time window. So, the 60 extra requests will be accumulated in > the latest window. Then, the client also has to wait for 12 more secs > before throttling is removed. I agree that this is probably a better > behavior and it's reasonable to change the existing behavior to this one. > > To me, it seems that sample_size * num_windows is the same as max burst > balance. The latter seems a bit better to configure. The thing is that the > existing quota system has already been used in quite a few places and if we > want to change the configuration in the future, there is the migration > cost. Given that, do you feel it's better to adopt the new token bucket > terminology or just adopt the behavior somehow into our existing system? If > it's the former, it would be useful to document this in the rejected > section and add a future plan on migrating existing quota configurations. > > Jun > > > On Tue, Jun 2, 2020 at 6:55 AM David Jacot wrote: > > > Hi Jun, > > > > Thanks for your reply. > > > > 10. I think that both options are likely equivalent from an accuracy > point > > of > > view. If we put the implementation aside, conceptually, I am not > convinced > > by the used based approach because resources don't have a clear owner > > in AK at the moment. A topic can be created by (Principal A, no client > id), > > then can be extended by (no principal, Client B), and finally deleted by > > (Principal C, Client C). This does not sound right to me and I fear that > it > > is not going to be easy to grasp for our users. > > > > Regarding the naming, I do agree that we can make it more future proof. > > I propose `controller_mutations_rate`. I think that differentiating the > > mutations > > from the reads is still a good thing for the future. > > > > 11. I am not convinced by your proposal for the following reasons: > > > > First, in my toy example, I used 101 windows and 7 * 80 requests. We > could > > effectively allocate 5 * 100 requests to the previous windows assuming > that > > they are empty. What shall we do with the remaining 60 requests? Shall we > > allocate them to the current window or shall we divide it among all the > > windows? > > > > Second, I don't think that we can safely change the behavior of all the > > existing > > rates used because it actually changes the computation of the rate as > > values > > allocated to past windows would expire before they would today. > > > > Overall, while trying to fit in the current rate, we are going to build a > > slightly > > different version of the rate which will be even more confusing for > users. > > > > Instead, I think that we should embrace the notion of burst as it could > > also > > be applied to other quotas in the future. Users don't have to know that > we > > use the Token Bucket or a special rate inside at the end of the day. It > is > > an > > implementation detail. > > > > Users would be able to define: > > - a rate R; and > > - a maximum burst B. > > > > If we change the metrics to be as follow: > > - the actual rate; > > - the burst balance in %, 0 meaning that the user is throttled; > > It remains disattach from the algorithm. > > > > I personally prefer this over having to define a rate and a number of > > windows > > while having to understand that the number of windows implicitly defines > > the > > allowed burst. I think that it is clearer and easier to
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the reply. 11. To match the behavior in the Token bucket approach, I was thinking that requests that don't fit in the previous time windows will be accumulated in the current time window. So, the 60 extra requests will be accumulated in the latest window. Then, the client also has to wait for 12 more secs before throttling is removed. I agree that this is probably a better behavior and it's reasonable to change the existing behavior to this one. To me, it seems that sample_size * num_windows is the same as max burst balance. The latter seems a bit better to configure. The thing is that the existing quota system has already been used in quite a few places and if we want to change the configuration in the future, there is the migration cost. Given that, do you feel it's better to adopt the new token bucket terminology or just adopt the behavior somehow into our existing system? If it's the former, it would be useful to document this in the rejected section and add a future plan on migrating existing quota configurations. Jun On Tue, Jun 2, 2020 at 6:55 AM David Jacot wrote: > Hi Jun, > > Thanks for your reply. > > 10. I think that both options are likely equivalent from an accuracy point > of > view. If we put the implementation aside, conceptually, I am not convinced > by the used based approach because resources don't have a clear owner > in AK at the moment. A topic can be created by (Principal A, no client id), > then can be extended by (no principal, Client B), and finally deleted by > (Principal C, Client C). This does not sound right to me and I fear that it > is not going to be easy to grasp for our users. > > Regarding the naming, I do agree that we can make it more future proof. > I propose `controller_mutations_rate`. I think that differentiating the > mutations > from the reads is still a good thing for the future. > > 11. I am not convinced by your proposal for the following reasons: > > First, in my toy example, I used 101 windows and 7 * 80 requests. We could > effectively allocate 5 * 100 requests to the previous windows assuming that > they are empty. What shall we do with the remaining 60 requests? Shall we > allocate them to the current window or shall we divide it among all the > windows? > > Second, I don't think that we can safely change the behavior of all the > existing > rates used because it actually changes the computation of the rate as > values > allocated to past windows would expire before they would today. > > Overall, while trying to fit in the current rate, we are going to build a > slightly > different version of the rate which will be even more confusing for users. > > Instead, I think that we should embrace the notion of burst as it could > also > be applied to other quotas in the future. Users don't have to know that we > use the Token Bucket or a special rate inside at the end of the day. It is > an > implementation detail. > > Users would be able to define: > - a rate R; and > - a maximum burst B. > > If we change the metrics to be as follow: > - the actual rate; > - the burst balance in %, 0 meaning that the user is throttled; > It remains disattach from the algorithm. > > I personally prefer this over having to define a rate and a number of > windows > while having to understand that the number of windows implicitly defines > the > allowed burst. I think that it is clearer and easier to grasp. Don't you? > > Best, > David > > On Fri, May 29, 2020 at 6:38 PM Jun Rao wrote: > > > Hi, David, Anna, > > > > Thanks for the response. Sorry for the late reply. > > > > 10. Regarding exposing rate or usage as quota. Your argument is that > usage > > is not very accurate anyway and is harder to implement. So, let's just > be a > > bit loose and expose rate. I am sort of neutral on that. (1) It seems to > me > > that overall usage will be relatively more accurate than rate. All the > > issues that Anna brought up seem to exist for rate too. Rate has the > > additional problem that the cost of each request may not be uniform. (2) > In > > terms of implementation, a usage based approach requires tracking the > user > > info through the life cycle of an operation. However, as you mentioned, > > things like topic creation can generate additional > > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to > > associate those cost to the user who initiated the operation. If we do > > that, we sort of need to track the user for the full life cycle of the > > processing of an operation anyway. (3) If you prefer rate strongly, I > don't > > have strong objections. However, I do feel that the new quota name should > > be able to cover all controller related cost longer term. This KIP > > currently only covers topic creation/deletion. It would not be ideal if > in > > the future, we have to add yet another type of quota for some other > > controller related operations. The quota name in the KIP has partition > > mutation. In the future, if we
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, One quick question about the implementation (I don't think it's spelled out in the KIP): Presumably if you make, for example, a create topics request with validate only it will check for quota violation, but not count towards quota violation, right? Many thanks, Tom On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram wrote: > Hi David, > > 2) sorry, that was my mistake. > > Regards, > > Rajini > > > On Wed, Jun 3, 2020 at 3:08 PM David Jacot wrote: > > > Hi Rajini, > > > > Thanks for your prompt response. > > 1) Good catch, fixed. > > 2) The retry mechanism will be in the client so a new field is not > > required in the requests. > > > > Regards, > > David > > > > On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram > > wrote: > > > > > Hi David, > > > > > > Thanks for the updates, looks good. Just a couple of minor comments: > > > 1) There is a typo in "*The channel will be mutated as well when > > > `throttle_time_ms > 0`." * Should be *muted*? > > > 2) Since the three requests will need a new field for ` > > > *retryQuotaViolatedException*`, we should perhaps add that change to > the > > > KIP. > > > > > > Regards, > > > > > > Rajini > > > > > > On Wed, Jun 3, 2020 at 1:02 PM David Jacot > wrote: > > > > > > > Hi all, > > > > > > > > I have updated the KIP based on our recent discussions. I have mainly > > > > changed the > > > > following points: > > > > * I have renamed the quota as suggested by Jun. > > > > * I have changed the metrics to be "token bucket" agnostic. The idea > is > > > to > > > > report the > > > > burst and the rate per principal/clientid. > > > > * I have removed the `retry.quota.violation.exception` configuration > > and > > > > replaced it > > > > with options in the respective methods' options. > > > > > > > > Now, the public interfaces are not tight to the algorithm that we use > > > > internally to throttle > > > > the requests but keep the notion of burst. I hope that this will help > > to > > > > alleviate the > > > > tokens bucket vs rate discussions. > > > > > > > > Please, have a look and let me know your thoughts. > > > > > > > > Bests, > > > > David > > > > > > > > > > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot > > wrote: > > > > > > > > > Hi Rajini, > > > > > > > > > > Thanks for your feedback. Please find my answers below: > > > > > > > > > > 1) Our main goal is to protect the controller from the extreme > users > > > > > (DDoS). We want > > > > > to protect it from large requests or repetitive requests coming > from > > a > > > > > single user. > > > > > That user could be used by multiple apps as you pointed out which > > makes > > > > it > > > > > even > > > > > worst. For instance, a buggy application could continuously create > > and > > > > > delete > > > > > topics in a tight loop. > > > > > > > > > > The idea of the burst is to still allow creating or deleting topics > > in > > > > > batch because > > > > > this is how applications tend to do it. However, we want to keep > the > > > > batch > > > > > size > > > > > under control with the burst. The burst does not allow requests of > > any > > > > > size. Topics > > > > > are accepted until the burst is passed. All the others are > rejected. > > > > > Ideally, regular > > > > > and well behaving applications should never or rarely be throttled. > > > > > > > > > > 2) No, there is no explicit bound on the maximum throttle time. > > Having > > > a > > > > > maximum > > > > > is straightforward here because the throttling time depends on the > > > actual > > > > > size of > > > > > the request. > > > > > > > > > > 3) That's a very good question that I haven't thought of. I was > > > thinking > > > > > about doing > > > > > it for the new quota only. I think that your suggestion of having a > > per > > > > > method argument > > > > > makes sense. I will update the KIP. > > > > > > > > > > 4) Indeed, it is outdated text. Let me update that. > > > > > > > > > > Regards, > > > > > David > > > > > > > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram < > > > rajinisiva...@gmail.com> > > > > > wrote: > > > > > > > > > >> Hi David, > > > > >> > > > > >> Thanks for the KIP. A few questions below: > > > > >> > > > > >> 1) The KIP says: *`Typically, applications tend to send one > request > > to > > > > >> create all the topics that they need`*. What would the point of > > > > throttling > > > > >> be in this case? If there was a user quota for the principal used > by > > > > that > > > > >> application, wouldn't we just allow the request due to the burst > > > value? > > > > Is > > > > >> the KIP specifically for the case where multiple apps with the > same > > > user > > > > >> principal are started all at once? > > > > >> > > > > >> 2) Will there be a bound on the maximum throttle time? > > > > >> > > > > >> 3) Will *retry.quota.violation.exception* have any impact on > request > > > > quota > > > > >> throttling or is it limited to the three requests where the new > > quota > > > is > > > > >> applied?
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1 (binding) Thanks for the KIP, David! Regards, Rajini On Sun, May 31, 2020 at 3:29 AM Gwen Shapira wrote: > +1 (binding) > > Looks great. Thank you for the in-depth design and discussion. > > On Fri, May 29, 2020 at 7:58 AM David Jacot wrote: > > > Hi folks, > > > > I'd like to start the vote for KIP-599 which proposes a new quota to > > throttle create topic, create partition, and delete topics operations to > > protect the Kafka controller: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > Please, let me know what you think. > > > > Cheers, > > David > > > > > -- > Gwen Shapira > Engineering Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter | blog >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, 2) sorry, that was my mistake. Regards, Rajini On Wed, Jun 3, 2020 at 3:08 PM David Jacot wrote: > Hi Rajini, > > Thanks for your prompt response. > 1) Good catch, fixed. > 2) The retry mechanism will be in the client so a new field is not > required in the requests. > > Regards, > David > > On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram > wrote: > > > Hi David, > > > > Thanks for the updates, looks good. Just a couple of minor comments: > > 1) There is a typo in "*The channel will be mutated as well when > > `throttle_time_ms > 0`." * Should be *muted*? > > 2) Since the three requests will need a new field for ` > > *retryQuotaViolatedException*`, we should perhaps add that change to the > > KIP. > > > > Regards, > > > > Rajini > > > > On Wed, Jun 3, 2020 at 1:02 PM David Jacot wrote: > > > > > Hi all, > > > > > > I have updated the KIP based on our recent discussions. I have mainly > > > changed the > > > following points: > > > * I have renamed the quota as suggested by Jun. > > > * I have changed the metrics to be "token bucket" agnostic. The idea is > > to > > > report the > > > burst and the rate per principal/clientid. > > > * I have removed the `retry.quota.violation.exception` configuration > and > > > replaced it > > > with options in the respective methods' options. > > > > > > Now, the public interfaces are not tight to the algorithm that we use > > > internally to throttle > > > the requests but keep the notion of burst. I hope that this will help > to > > > alleviate the > > > tokens bucket vs rate discussions. > > > > > > Please, have a look and let me know your thoughts. > > > > > > Bests, > > > David > > > > > > > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot > wrote: > > > > > > > Hi Rajini, > > > > > > > > Thanks for your feedback. Please find my answers below: > > > > > > > > 1) Our main goal is to protect the controller from the extreme users > > > > (DDoS). We want > > > > to protect it from large requests or repetitive requests coming from > a > > > > single user. > > > > That user could be used by multiple apps as you pointed out which > makes > > > it > > > > even > > > > worst. For instance, a buggy application could continuously create > and > > > > delete > > > > topics in a tight loop. > > > > > > > > The idea of the burst is to still allow creating or deleting topics > in > > > > batch because > > > > this is how applications tend to do it. However, we want to keep the > > > batch > > > > size > > > > under control with the burst. The burst does not allow requests of > any > > > > size. Topics > > > > are accepted until the burst is passed. All the others are rejected. > > > > Ideally, regular > > > > and well behaving applications should never or rarely be throttled. > > > > > > > > 2) No, there is no explicit bound on the maximum throttle time. > Having > > a > > > > maximum > > > > is straightforward here because the throttling time depends on the > > actual > > > > size of > > > > the request. > > > > > > > > 3) That's a very good question that I haven't thought of. I was > > thinking > > > > about doing > > > > it for the new quota only. I think that your suggestion of having a > per > > > > method argument > > > > makes sense. I will update the KIP. > > > > > > > > 4) Indeed, it is outdated text. Let me update that. > > > > > > > > Regards, > > > > David > > > > > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram < > > rajinisiva...@gmail.com> > > > > wrote: > > > > > > > >> Hi David, > > > >> > > > >> Thanks for the KIP. A few questions below: > > > >> > > > >> 1) The KIP says: *`Typically, applications tend to send one request > to > > > >> create all the topics that they need`*. What would the point of > > > throttling > > > >> be in this case? If there was a user quota for the principal used by > > > that > > > >> application, wouldn't we just allow the request due to the burst > > value? > > > Is > > > >> the KIP specifically for the case where multiple apps with the same > > user > > > >> principal are started all at once? > > > >> > > > >> 2) Will there be a bound on the maximum throttle time? > > > >> > > > >> 3) Will *retry.quota.violation.exception* have any impact on request > > > quota > > > >> throttling or is it limited to the three requests where the new > quota > > is > > > >> applied? If it is only for the new quota, perhaps it would be better > > to > > > >> add > > > >> an option to the relevant requests rather than use an admin client > > > config. > > > >> > > > >> 4) Is this outdated text, wasn't sure what this refers to : *While > the > > > >> configuration could be set by broker, we envision to have it set > only > > > has > > > >> a > > > >> cluster wide default for two reasons.* > > > >> > > > >> Regards, > > > >> > > > >> Rajini > > > >> > > > >> > > > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot > > wrote: > > > >> > > > >> > Hi Jun, > > > >> > > > > >> > Thanks for your reply. > > > >> > > > > >> > 10. I think that
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Rajini, Thanks for your prompt response. 1) Good catch, fixed. 2) The retry mechanism will be in the client so a new field is not required in the requests. Regards, David On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram wrote: > Hi David, > > Thanks for the updates, looks good. Just a couple of minor comments: > 1) There is a typo in "*The channel will be mutated as well when > `throttle_time_ms > 0`." * Should be *muted*? > 2) Since the three requests will need a new field for ` > *retryQuotaViolatedException*`, we should perhaps add that change to the > KIP. > > Regards, > > Rajini > > On Wed, Jun 3, 2020 at 1:02 PM David Jacot wrote: > > > Hi all, > > > > I have updated the KIP based on our recent discussions. I have mainly > > changed the > > following points: > > * I have renamed the quota as suggested by Jun. > > * I have changed the metrics to be "token bucket" agnostic. The idea is > to > > report the > > burst and the rate per principal/clientid. > > * I have removed the `retry.quota.violation.exception` configuration and > > replaced it > > with options in the respective methods' options. > > > > Now, the public interfaces are not tight to the algorithm that we use > > internally to throttle > > the requests but keep the notion of burst. I hope that this will help to > > alleviate the > > tokens bucket vs rate discussions. > > > > Please, have a look and let me know your thoughts. > > > > Bests, > > David > > > > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot wrote: > > > > > Hi Rajini, > > > > > > Thanks for your feedback. Please find my answers below: > > > > > > 1) Our main goal is to protect the controller from the extreme users > > > (DDoS). We want > > > to protect it from large requests or repetitive requests coming from a > > > single user. > > > That user could be used by multiple apps as you pointed out which makes > > it > > > even > > > worst. For instance, a buggy application could continuously create and > > > delete > > > topics in a tight loop. > > > > > > The idea of the burst is to still allow creating or deleting topics in > > > batch because > > > this is how applications tend to do it. However, we want to keep the > > batch > > > size > > > under control with the burst. The burst does not allow requests of any > > > size. Topics > > > are accepted until the burst is passed. All the others are rejected. > > > Ideally, regular > > > and well behaving applications should never or rarely be throttled. > > > > > > 2) No, there is no explicit bound on the maximum throttle time. Having > a > > > maximum > > > is straightforward here because the throttling time depends on the > actual > > > size of > > > the request. > > > > > > 3) That's a very good question that I haven't thought of. I was > thinking > > > about doing > > > it for the new quota only. I think that your suggestion of having a per > > > method argument > > > makes sense. I will update the KIP. > > > > > > 4) Indeed, it is outdated text. Let me update that. > > > > > > Regards, > > > David > > > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram < > rajinisiva...@gmail.com> > > > wrote: > > > > > >> Hi David, > > >> > > >> Thanks for the KIP. A few questions below: > > >> > > >> 1) The KIP says: *`Typically, applications tend to send one request to > > >> create all the topics that they need`*. What would the point of > > throttling > > >> be in this case? If there was a user quota for the principal used by > > that > > >> application, wouldn't we just allow the request due to the burst > value? > > Is > > >> the KIP specifically for the case where multiple apps with the same > user > > >> principal are started all at once? > > >> > > >> 2) Will there be a bound on the maximum throttle time? > > >> > > >> 3) Will *retry.quota.violation.exception* have any impact on request > > quota > > >> throttling or is it limited to the three requests where the new quota > is > > >> applied? If it is only for the new quota, perhaps it would be better > to > > >> add > > >> an option to the relevant requests rather than use an admin client > > config. > > >> > > >> 4) Is this outdated text, wasn't sure what this refers to : *While the > > >> configuration could be set by broker, we envision to have it set only > > has > > >> a > > >> cluster wide default for two reasons.* > > >> > > >> Regards, > > >> > > >> Rajini > > >> > > >> > > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot > wrote: > > >> > > >> > Hi Jun, > > >> > > > >> > Thanks for your reply. > > >> > > > >> > 10. I think that both options are likely equivalent from an accuracy > > >> point > > >> > of > > >> > view. If we put the implementation aside, conceptually, I am not > > >> convinced > > >> > by the used based approach because resources don't have a clear > owner > > >> > in AK at the moment. A topic can be created by (Principal A, no > client > > >> id), > > >> > then can be extended by (no principal, Client B), and finally > deleted > > by > > >> >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the updates, looks good. Just a couple of minor comments: 1) There is a typo in "*The channel will be mutated as well when `throttle_time_ms > 0`." * Should be *muted*? 2) Since the three requests will need a new field for ` *retryQuotaViolatedException*`, we should perhaps add that change to the KIP. Regards, Rajini On Wed, Jun 3, 2020 at 1:02 PM David Jacot wrote: > Hi all, > > I have updated the KIP based on our recent discussions. I have mainly > changed the > following points: > * I have renamed the quota as suggested by Jun. > * I have changed the metrics to be "token bucket" agnostic. The idea is to > report the > burst and the rate per principal/clientid. > * I have removed the `retry.quota.violation.exception` configuration and > replaced it > with options in the respective methods' options. > > Now, the public interfaces are not tight to the algorithm that we use > internally to throttle > the requests but keep the notion of burst. I hope that this will help to > alleviate the > tokens bucket vs rate discussions. > > Please, have a look and let me know your thoughts. > > Bests, > David > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot wrote: > > > Hi Rajini, > > > > Thanks for your feedback. Please find my answers below: > > > > 1) Our main goal is to protect the controller from the extreme users > > (DDoS). We want > > to protect it from large requests or repetitive requests coming from a > > single user. > > That user could be used by multiple apps as you pointed out which makes > it > > even > > worst. For instance, a buggy application could continuously create and > > delete > > topics in a tight loop. > > > > The idea of the burst is to still allow creating or deleting topics in > > batch because > > this is how applications tend to do it. However, we want to keep the > batch > > size > > under control with the burst. The burst does not allow requests of any > > size. Topics > > are accepted until the burst is passed. All the others are rejected. > > Ideally, regular > > and well behaving applications should never or rarely be throttled. > > > > 2) No, there is no explicit bound on the maximum throttle time. Having a > > maximum > > is straightforward here because the throttling time depends on the actual > > size of > > the request. > > > > 3) That's a very good question that I haven't thought of. I was thinking > > about doing > > it for the new quota only. I think that your suggestion of having a per > > method argument > > makes sense. I will update the KIP. > > > > 4) Indeed, it is outdated text. Let me update that. > > > > Regards, > > David > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram > > wrote: > > > >> Hi David, > >> > >> Thanks for the KIP. A few questions below: > >> > >> 1) The KIP says: *`Typically, applications tend to send one request to > >> create all the topics that they need`*. What would the point of > throttling > >> be in this case? If there was a user quota for the principal used by > that > >> application, wouldn't we just allow the request due to the burst value? > Is > >> the KIP specifically for the case where multiple apps with the same user > >> principal are started all at once? > >> > >> 2) Will there be a bound on the maximum throttle time? > >> > >> 3) Will *retry.quota.violation.exception* have any impact on request > quota > >> throttling or is it limited to the three requests where the new quota is > >> applied? If it is only for the new quota, perhaps it would be better to > >> add > >> an option to the relevant requests rather than use an admin client > config. > >> > >> 4) Is this outdated text, wasn't sure what this refers to : *While the > >> configuration could be set by broker, we envision to have it set only > has > >> a > >> cluster wide default for two reasons.* > >> > >> Regards, > >> > >> Rajini > >> > >> > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot wrote: > >> > >> > Hi Jun, > >> > > >> > Thanks for your reply. > >> > > >> > 10. I think that both options are likely equivalent from an accuracy > >> point > >> > of > >> > view. If we put the implementation aside, conceptually, I am not > >> convinced > >> > by the used based approach because resources don't have a clear owner > >> > in AK at the moment. A topic can be created by (Principal A, no client > >> id), > >> > then can be extended by (no principal, Client B), and finally deleted > by > >> > (Principal C, Client C). This does not sound right to me and I fear > >> that it > >> > is not going to be easy to grasp for our users. > >> > > >> > Regarding the naming, I do agree that we can make it more future > proof. > >> > I propose `controller_mutations_rate`. I think that differentiating > the > >> > mutations > >> > from the reads is still a good thing for the future. > >> > > >> > 11. I am not convinced by your proposal for the following reasons: > >> > > >> > First, in my toy example, I used 101 windows and 7 * 80 requests. We > >> could > >> >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi all, I have updated the KIP based on our recent discussions. I have mainly changed the following points: * I have renamed the quota as suggested by Jun. * I have changed the metrics to be "token bucket" agnostic. The idea is to report the burst and the rate per principal/clientid. * I have removed the `retry.quota.violation.exception` configuration and replaced it with options in the respective methods' options. Now, the public interfaces are not tight to the algorithm that we use internally to throttle the requests but keep the notion of burst. I hope that this will help to alleviate the tokens bucket vs rate discussions. Please, have a look and let me know your thoughts. Bests, David On Wed, Jun 3, 2020 at 10:17 AM David Jacot wrote: > Hi Rajini, > > Thanks for your feedback. Please find my answers below: > > 1) Our main goal is to protect the controller from the extreme users > (DDoS). We want > to protect it from large requests or repetitive requests coming from a > single user. > That user could be used by multiple apps as you pointed out which makes it > even > worst. For instance, a buggy application could continuously create and > delete > topics in a tight loop. > > The idea of the burst is to still allow creating or deleting topics in > batch because > this is how applications tend to do it. However, we want to keep the batch > size > under control with the burst. The burst does not allow requests of any > size. Topics > are accepted until the burst is passed. All the others are rejected. > Ideally, regular > and well behaving applications should never or rarely be throttled. > > 2) No, there is no explicit bound on the maximum throttle time. Having a > maximum > is straightforward here because the throttling time depends on the actual > size of > the request. > > 3) That's a very good question that I haven't thought of. I was thinking > about doing > it for the new quota only. I think that your suggestion of having a per > method argument > makes sense. I will update the KIP. > > 4) Indeed, it is outdated text. Let me update that. > > Regards, > David > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram > wrote: > >> Hi David, >> >> Thanks for the KIP. A few questions below: >> >> 1) The KIP says: *`Typically, applications tend to send one request to >> create all the topics that they need`*. What would the point of throttling >> be in this case? If there was a user quota for the principal used by that >> application, wouldn't we just allow the request due to the burst value? Is >> the KIP specifically for the case where multiple apps with the same user >> principal are started all at once? >> >> 2) Will there be a bound on the maximum throttle time? >> >> 3) Will *retry.quota.violation.exception* have any impact on request quota >> throttling or is it limited to the three requests where the new quota is >> applied? If it is only for the new quota, perhaps it would be better to >> add >> an option to the relevant requests rather than use an admin client config. >> >> 4) Is this outdated text, wasn't sure what this refers to : *While the >> configuration could be set by broker, we envision to have it set only has >> a >> cluster wide default for two reasons.* >> >> Regards, >> >> Rajini >> >> >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot wrote: >> >> > Hi Jun, >> > >> > Thanks for your reply. >> > >> > 10. I think that both options are likely equivalent from an accuracy >> point >> > of >> > view. If we put the implementation aside, conceptually, I am not >> convinced >> > by the used based approach because resources don't have a clear owner >> > in AK at the moment. A topic can be created by (Principal A, no client >> id), >> > then can be extended by (no principal, Client B), and finally deleted by >> > (Principal C, Client C). This does not sound right to me and I fear >> that it >> > is not going to be easy to grasp for our users. >> > >> > Regarding the naming, I do agree that we can make it more future proof. >> > I propose `controller_mutations_rate`. I think that differentiating the >> > mutations >> > from the reads is still a good thing for the future. >> > >> > 11. I am not convinced by your proposal for the following reasons: >> > >> > First, in my toy example, I used 101 windows and 7 * 80 requests. We >> could >> > effectively allocate 5 * 100 requests to the previous windows assuming >> that >> > they are empty. What shall we do with the remaining 60 requests? Shall >> we >> > allocate them to the current window or shall we divide it among all the >> > windows? >> > >> > Second, I don't think that we can safely change the behavior of all the >> > existing >> > rates used because it actually changes the computation of the rate as >> > values >> > allocated to past windows would expire before they would today. >> > >> > Overall, while trying to fit in the current rate, we are going to build >> a >> > slightly >> > different version of the rate which will be even more
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Rajini, Thanks for your feedback. Please find my answers below: 1) Our main goal is to protect the controller from the extreme users (DDoS). We want to protect it from large requests or repetitive requests coming from a single user. That user could be used by multiple apps as you pointed out which makes it even worst. For instance, a buggy application could continuously create and delete topics in a tight loop. The idea of the burst is to still allow creating or deleting topics in batch because this is how applications tend to do it. However, we want to keep the batch size under control with the burst. The burst does not allow requests of any size. Topics are accepted until the burst is passed. All the others are rejected. Ideally, regular and well behaving applications should never or rarely be throttled. 2) No, there is no explicit bound on the maximum throttle time. Having a maximum is straightforward here because the throttling time depends on the actual size of the request. 3) That's a very good question that I haven't thought of. I was thinking about doing it for the new quota only. I think that your suggestion of having a per method argument makes sense. I will update the KIP. 4) Indeed, it is outdated text. Let me update that. Regards, David On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram wrote: > Hi David, > > Thanks for the KIP. A few questions below: > > 1) The KIP says: *`Typically, applications tend to send one request to > create all the topics that they need`*. What would the point of throttling > be in this case? If there was a user quota for the principal used by that > application, wouldn't we just allow the request due to the burst value? Is > the KIP specifically for the case where multiple apps with the same user > principal are started all at once? > > 2) Will there be a bound on the maximum throttle time? > > 3) Will *retry.quota.violation.exception* have any impact on request quota > throttling or is it limited to the three requests where the new quota is > applied? If it is only for the new quota, perhaps it would be better to add > an option to the relevant requests rather than use an admin client config. > > 4) Is this outdated text, wasn't sure what this refers to : *While the > configuration could be set by broker, we envision to have it set only has a > cluster wide default for two reasons.* > > Regards, > > Rajini > > > On Tue, Jun 2, 2020 at 2:55 PM David Jacot wrote: > > > Hi Jun, > > > > Thanks for your reply. > > > > 10. I think that both options are likely equivalent from an accuracy > point > > of > > view. If we put the implementation aside, conceptually, I am not > convinced > > by the used based approach because resources don't have a clear owner > > in AK at the moment. A topic can be created by (Principal A, no client > id), > > then can be extended by (no principal, Client B), and finally deleted by > > (Principal C, Client C). This does not sound right to me and I fear that > it > > is not going to be easy to grasp for our users. > > > > Regarding the naming, I do agree that we can make it more future proof. > > I propose `controller_mutations_rate`. I think that differentiating the > > mutations > > from the reads is still a good thing for the future. > > > > 11. I am not convinced by your proposal for the following reasons: > > > > First, in my toy example, I used 101 windows and 7 * 80 requests. We > could > > effectively allocate 5 * 100 requests to the previous windows assuming > that > > they are empty. What shall we do with the remaining 60 requests? Shall we > > allocate them to the current window or shall we divide it among all the > > windows? > > > > Second, I don't think that we can safely change the behavior of all the > > existing > > rates used because it actually changes the computation of the rate as > > values > > allocated to past windows would expire before they would today. > > > > Overall, while trying to fit in the current rate, we are going to build a > > slightly > > different version of the rate which will be even more confusing for > users. > > > > Instead, I think that we should embrace the notion of burst as it could > > also > > be applied to other quotas in the future. Users don't have to know that > we > > use the Token Bucket or a special rate inside at the end of the day. It > is > > an > > implementation detail. > > > > Users would be able to define: > > - a rate R; and > > - a maximum burst B. > > > > If we change the metrics to be as follow: > > - the actual rate; > > - the burst balance in %, 0 meaning that the user is throttled; > > It remains disattach from the algorithm. > > > > I personally prefer this over having to define a rate and a number of > > windows > > while having to understand that the number of windows implicitly defines > > the > > allowed burst. I think that it is clearer and easier to grasp. Don't you? > > > > Best, > > David > > > > On Fri, May 29, 2020 at 6:38 PM Jun Rao wrote: > > > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the KIP. A few questions below: 1) The KIP says: *`Typically, applications tend to send one request to create all the topics that they need`*. What would the point of throttling be in this case? If there was a user quota for the principal used by that application, wouldn't we just allow the request due to the burst value? Is the KIP specifically for the case where multiple apps with the same user principal are started all at once? 2) Will there be a bound on the maximum throttle time? 3) Will *retry.quota.violation.exception* have any impact on request quota throttling or is it limited to the three requests where the new quota is applied? If it is only for the new quota, perhaps it would be better to add an option to the relevant requests rather than use an admin client config. 4) Is this outdated text, wasn't sure what this refers to : *While the configuration could be set by broker, we envision to have it set only has a cluster wide default for two reasons.* Regards, Rajini On Tue, Jun 2, 2020 at 2:55 PM David Jacot wrote: > Hi Jun, > > Thanks for your reply. > > 10. I think that both options are likely equivalent from an accuracy point > of > view. If we put the implementation aside, conceptually, I am not convinced > by the used based approach because resources don't have a clear owner > in AK at the moment. A topic can be created by (Principal A, no client id), > then can be extended by (no principal, Client B), and finally deleted by > (Principal C, Client C). This does not sound right to me and I fear that it > is not going to be easy to grasp for our users. > > Regarding the naming, I do agree that we can make it more future proof. > I propose `controller_mutations_rate`. I think that differentiating the > mutations > from the reads is still a good thing for the future. > > 11. I am not convinced by your proposal for the following reasons: > > First, in my toy example, I used 101 windows and 7 * 80 requests. We could > effectively allocate 5 * 100 requests to the previous windows assuming that > they are empty. What shall we do with the remaining 60 requests? Shall we > allocate them to the current window or shall we divide it among all the > windows? > > Second, I don't think that we can safely change the behavior of all the > existing > rates used because it actually changes the computation of the rate as > values > allocated to past windows would expire before they would today. > > Overall, while trying to fit in the current rate, we are going to build a > slightly > different version of the rate which will be even more confusing for users. > > Instead, I think that we should embrace the notion of burst as it could > also > be applied to other quotas in the future. Users don't have to know that we > use the Token Bucket or a special rate inside at the end of the day. It is > an > implementation detail. > > Users would be able to define: > - a rate R; and > - a maximum burst B. > > If we change the metrics to be as follow: > - the actual rate; > - the burst balance in %, 0 meaning that the user is throttled; > It remains disattach from the algorithm. > > I personally prefer this over having to define a rate and a number of > windows > while having to understand that the number of windows implicitly defines > the > allowed burst. I think that it is clearer and easier to grasp. Don't you? > > Best, > David > > On Fri, May 29, 2020 at 6:38 PM Jun Rao wrote: > > > Hi, David, Anna, > > > > Thanks for the response. Sorry for the late reply. > > > > 10. Regarding exposing rate or usage as quota. Your argument is that > usage > > is not very accurate anyway and is harder to implement. So, let's just > be a > > bit loose and expose rate. I am sort of neutral on that. (1) It seems to > me > > that overall usage will be relatively more accurate than rate. All the > > issues that Anna brought up seem to exist for rate too. Rate has the > > additional problem that the cost of each request may not be uniform. (2) > In > > terms of implementation, a usage based approach requires tracking the > user > > info through the life cycle of an operation. However, as you mentioned, > > things like topic creation can generate additional > > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to > > associate those cost to the user who initiated the operation. If we do > > that, we sort of need to track the user for the full life cycle of the > > processing of an operation anyway. (3) If you prefer rate strongly, I > don't > > have strong objections. However, I do feel that the new quota name should > > be able to cover all controller related cost longer term. This KIP > > currently only covers topic creation/deletion. It would not be ideal if > in > > the future, we have to add yet another type of quota for some other > > controller related operations. The quota name in the KIP has partition > > mutation. In the future, if we allow things like topic renaming, it may > not
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, Thanks for your reply. 10. I think that both options are likely equivalent from an accuracy point of view. If we put the implementation aside, conceptually, I am not convinced by the used based approach because resources don't have a clear owner in AK at the moment. A topic can be created by (Principal A, no client id), then can be extended by (no principal, Client B), and finally deleted by (Principal C, Client C). This does not sound right to me and I fear that it is not going to be easy to grasp for our users. Regarding the naming, I do agree that we can make it more future proof. I propose `controller_mutations_rate`. I think that differentiating the mutations from the reads is still a good thing for the future. 11. I am not convinced by your proposal for the following reasons: First, in my toy example, I used 101 windows and 7 * 80 requests. We could effectively allocate 5 * 100 requests to the previous windows assuming that they are empty. What shall we do with the remaining 60 requests? Shall we allocate them to the current window or shall we divide it among all the windows? Second, I don't think that we can safely change the behavior of all the existing rates used because it actually changes the computation of the rate as values allocated to past windows would expire before they would today. Overall, while trying to fit in the current rate, we are going to build a slightly different version of the rate which will be even more confusing for users. Instead, I think that we should embrace the notion of burst as it could also be applied to other quotas in the future. Users don't have to know that we use the Token Bucket or a special rate inside at the end of the day. It is an implementation detail. Users would be able to define: - a rate R; and - a maximum burst B. If we change the metrics to be as follow: - the actual rate; - the burst balance in %, 0 meaning that the user is throttled; It remains disattach from the algorithm. I personally prefer this over having to define a rate and a number of windows while having to understand that the number of windows implicitly defines the allowed burst. I think that it is clearer and easier to grasp. Don't you? Best, David On Fri, May 29, 2020 at 6:38 PM Jun Rao wrote: > Hi, David, Anna, > > Thanks for the response. Sorry for the late reply. > > 10. Regarding exposing rate or usage as quota. Your argument is that usage > is not very accurate anyway and is harder to implement. So, let's just be a > bit loose and expose rate. I am sort of neutral on that. (1) It seems to me > that overall usage will be relatively more accurate than rate. All the > issues that Anna brought up seem to exist for rate too. Rate has the > additional problem that the cost of each request may not be uniform. (2) In > terms of implementation, a usage based approach requires tracking the user > info through the life cycle of an operation. However, as you mentioned, > things like topic creation can generate additional > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to > associate those cost to the user who initiated the operation. If we do > that, we sort of need to track the user for the full life cycle of the > processing of an operation anyway. (3) If you prefer rate strongly, I don't > have strong objections. However, I do feel that the new quota name should > be able to cover all controller related cost longer term. This KIP > currently only covers topic creation/deletion. It would not be ideal if in > the future, we have to add yet another type of quota for some other > controller related operations. The quota name in the KIP has partition > mutation. In the future, if we allow things like topic renaming, it may not > be related to partition mutation directly and it would be trickier to fit > those operations in the current quota. So, maybe sth more general like > controller_operations_quota will be more future proof. > > 11. Regarding the difference between the token bucket algorithm and our > current quota mechanism. That's a good observation. It seems that we can > make a slight change to our current quota mechanism to achieve a similar > thing. As you said, the issue was that we allocate all 7 * 80 requests in > the last 1 sec window in our current mechanism. This is a bit unintuitive > since each sec only has a quota capacity of 5. An alternative way is to > allocate the 7 * 80 requests to all previous windows, each up to the > remaining quota capacity left. So, you will fill the current 1 sec window > and all previous ones, each with 5. Then, it seems this will give the same > behavior as token bucket? The reason that I keep asking this is that from > an operational perspective, it's simpler if all types of quotas work in the > same way. > > Jun > > On Tue, May 26, 2020 at 9:52 AM David Jacot wrote: > > > Hi folks, > > > > I have updated the KIP. As mentioned by Jun, I have made the > > quota per principal/clientid similarly to the
Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
+1 (binding) Looks great. Thank you for the in-depth design and discussion. On Fri, May 29, 2020 at 7:58 AM David Jacot wrote: > Hi folks, > > I'd like to start the vote for KIP-599 which proposes a new quota to > throttle create topic, create partition, and delete topics operations to > protect the Kafka controller: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > Please, let me know what you think. > > Cheers, > David > -- Gwen Shapira Engineering Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Anna, Thanks for the response. Sorry for the late reply. 10. Regarding exposing rate or usage as quota. Your argument is that usage is not very accurate anyway and is harder to implement. So, let's just be a bit loose and expose rate. I am sort of neutral on that. (1) It seems to me that overall usage will be relatively more accurate than rate. All the issues that Anna brought up seem to exist for rate too. Rate has the additional problem that the cost of each request may not be uniform. (2) In terms of implementation, a usage based approach requires tracking the user info through the life cycle of an operation. However, as you mentioned, things like topic creation can generate additional LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to associate those cost to the user who initiated the operation. If we do that, we sort of need to track the user for the full life cycle of the processing of an operation anyway. (3) If you prefer rate strongly, I don't have strong objections. However, I do feel that the new quota name should be able to cover all controller related cost longer term. This KIP currently only covers topic creation/deletion. It would not be ideal if in the future, we have to add yet another type of quota for some other controller related operations. The quota name in the KIP has partition mutation. In the future, if we allow things like topic renaming, it may not be related to partition mutation directly and it would be trickier to fit those operations in the current quota. So, maybe sth more general like controller_operations_quota will be more future proof. 11. Regarding the difference between the token bucket algorithm and our current quota mechanism. That's a good observation. It seems that we can make a slight change to our current quota mechanism to achieve a similar thing. As you said, the issue was that we allocate all 7 * 80 requests in the last 1 sec window in our current mechanism. This is a bit unintuitive since each sec only has a quota capacity of 5. An alternative way is to allocate the 7 * 80 requests to all previous windows, each up to the remaining quota capacity left. So, you will fill the current 1 sec window and all previous ones, each with 5. Then, it seems this will give the same behavior as token bucket? The reason that I keep asking this is that from an operational perspective, it's simpler if all types of quotas work in the same way. Jun On Tue, May 26, 2020 at 9:52 AM David Jacot wrote: > Hi folks, > > I have updated the KIP. As mentioned by Jun, I have made the > quota per principal/clientid similarly to the other quotas. I have > also explained how this will work in conjunction with the auto > topics creation. > > Please, take a look at it. I plan to call a vote for it in the next few > days if there are no comments in this thread. > > Best, > David > > On Wed, May 13, 2020 at 10:57 AM Tom Bentley wrote: > > > Hi David, > > > > Thanks for the explanation and confirmation that evolving the APIs is not > > off the table in the longer term. > > > > Kind regards, > > > > Tom > > >
[VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi folks, I'd like to start the vote for KIP-599 which proposes a new quota to throttle create topic, create partition, and delete topics operations to protect the Kafka controller: https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations Please, let me know what you think. Cheers, David
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi folks, I have updated the KIP. As mentioned by Jun, I have made the quota per principal/clientid similarly to the other quotas. I have also explained how this will work in conjunction with the auto topics creation. Please, take a look at it. I plan to call a vote for it in the next few days if there are no comments in this thread. Best, David On Wed, May 13, 2020 at 10:57 AM Tom Bentley wrote: > Hi David, > > Thanks for the explanation and confirmation that evolving the APIs is not > off the table in the longer term. > > Kind regards, > > Tom >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the explanation and confirmation that evolving the APIs is not off the table in the longer term. Kind regards, Tom
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Tom, >> What exactly is the problem with having a huge backlog of pending >> operations? I can see that the backlog would need persisting so that the >> controller could change without losing track of the topics to be mutated, >> and the mutations would need to be submitted in batches to the controller >> event queue (thus allowing other controller requests to be interleaved). I >> realise this is not feasible right now, I'm just trying to understand if >> it's feasible at all and if there's any appetite for making the requisite >> API changes in the future in order to prevent these problems even for large >> single requests. It is definitely feasible. My concern with the approach is about the way our current API works. Let me try to illustrate it with an example. When the admin client sends a CreateTopicsRequest to the controller, the request goes to the purgatory waiting until all the topics are created or the timeout specified in the request is reached. If the timeout is reached, a RequestTimeoutException is returned to the client. This is used to fail the future of the caller. In conjunction, the admin client fails any pending request with a TimeoutException after the request timeout is reached (30s by default). In the former case, the caller will likely retry. In the later case, the admin client will automatically retry. In both cases, the broker will respond with a TopicExistsException. Having a huge backlog of pending operations will amplify this weird behavior. Clients will tend to get TopicExistsException errors when they create topics for the first time which is really weird. I think that our current API is not well suited for this. An asynchronous API workflow with one API to create/delete and another one to query the status of completion would be better suited. We can definitely involve our API towards this but we need to figure out a compatibility story for existing clients. Another aspect is the fairness among the clients. Imagine a case where one client continuously creates and deletes topics in a tight loop. This would flood the queue and delay the creations and the deletions of the other clients. Throttling at admission time mitigates this directly. Throttling at execution would need to take this into account to ensure fairness among the clients. It is a little harder to do this in the controller as the controller is completely agnostic from the principals and the client ids. These reasons made me lean towards the current proposal. Does that make sense? Best, David On Wed, May 13, 2020 at 10:05 AM David Jacot wrote: > Hi Jun, > > Coming back to your question regarding the differences between the token > bucket algorithm and our current quota mechanism. I did some tests and > they confirmed my first intuition that our current mechanism does not work > well with a bursty workload. Let me try to illustrate the difference with > an > example. One important aspect to keep in mind is that we don't want to > reject requests when the quota is exhausted. > > Let's say that we want to guarantee an average rate R=5 partitions/sec > while > allowing a burst B=500 partitions. > > With our current mechanism, this translates to following parameters: > - Quota = 5 > - Samples = B / R + 1 = 101 (to allow the burst) > - Time Window = 1s (the default) > > Now, let's say that a client wants to create 7 topics with 80 partitions > each at > the time T. It brings the rate to 5.6 (7 * 80 / 100) which is above the > quota so > any new request is rejected until the quota gets back above R. In theory, > the > client must wait 12 secs ((5.6 - 5) / 5 * 100) to get it back to R. In > practice, due > to the sparse samples (one sample worth 560), the rate won't decrease until > that sample is dropped and it will be after 101 secs. It gets worse if the > burst > is increased. > > With the token bucket algorithm, this translate to the following > parameters: > - Rate = 5 > - Tokens = 500 > > The same request decreases the number of available tokens to -60 which is > below 0 so any new request is rejected until the number of available tokens > gets back above 0. This takes 12 secs ((60+1) / 5). > > The token bucket algorithm is more suited for bursty workloads which is our > case here. I hope that this example helps to clarify the choice. > > Best, > David > > On Tue, May 12, 2020 at 3:19 PM Tom Bentley wrote: > >> Hi David, >> >> Thanks for the reply. >> >> >> If I understand the proposed throttling algorithm, an initial request >> > would >> > >> be allowed (possibly making K negative) and only subsequent requests >> > >> (before K became positive) would receive the QUOTA_VIOLATED. That >> would >> > >> mean it was still possible to block the controller from handling >> other >> > >> events – you just need to do so via making one big request. >> > >> > That is correct. One could still create one big topic (not request) and >> > that would >> > create some load on the controller. All options suffer from
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, Coming back to your question regarding the differences between the token bucket algorithm and our current quota mechanism. I did some tests and they confirmed my first intuition that our current mechanism does not work well with a bursty workload. Let me try to illustrate the difference with an example. One important aspect to keep in mind is that we don't want to reject requests when the quota is exhausted. Let's say that we want to guarantee an average rate R=5 partitions/sec while allowing a burst B=500 partitions. With our current mechanism, this translates to following parameters: - Quota = 5 - Samples = B / R + 1 = 101 (to allow the burst) - Time Window = 1s (the default) Now, let's say that a client wants to create 7 topics with 80 partitions each at the time T. It brings the rate to 5.6 (7 * 80 / 100) which is above the quota so any new request is rejected until the quota gets back above R. In theory, the client must wait 12 secs ((5.6 - 5) / 5 * 100) to get it back to R. In practice, due to the sparse samples (one sample worth 560), the rate won't decrease until that sample is dropped and it will be after 101 secs. It gets worse if the burst is increased. With the token bucket algorithm, this translate to the following parameters: - Rate = 5 - Tokens = 500 The same request decreases the number of available tokens to -60 which is below 0 so any new request is rejected until the number of available tokens gets back above 0. This takes 12 secs ((60+1) / 5). The token bucket algorithm is more suited for bursty workloads which is our case here. I hope that this example helps to clarify the choice. Best, David On Tue, May 12, 2020 at 3:19 PM Tom Bentley wrote: > Hi David, > > Thanks for the reply. > > >> If I understand the proposed throttling algorithm, an initial request > > would > > >> be allowed (possibly making K negative) and only subsequent requests > > >> (before K became positive) would receive the QUOTA_VIOLATED. That > would > > >> mean it was still possible to block the controller from handling other > > >> events – you just need to do so via making one big request. > > > > That is correct. One could still create one big topic (not request) and > > that would > > create some load on the controller. All options suffer from this issue as > > we can > > stop clients from creating a very large topic. At least, when it happens, > > the client > > will have to wait to pay back its credits which guarantee that we control > > the average > > load on the controller. > > > > I can see that the admission throttling is better than nothing. It's just > that it doesn't fully solve the problem described in the KIP's motivation. > That doesn't mean it's not worth doing. I certainly prefer this approach > over that taken by KIP-578 (which cites the effects of deleting a single > topic with many partitions). > > > > >> While the reasons for rejecting execution throttling make sense given > > the > > >> RPCs we have today that seems to be at the cost of still allowing harm > > to > > >> the cluster, or did I misunderstand? > > > > Execution throttling would also suffer from large topics being created. > We > > have > > rejected it due to the current RPCs and also because it does not limit > the > > amount > > of work queued up in the controller. Imagine a low quota, that would > result > > in a huge > > backlog of pending operations. > > > > What exactly is the problem with having a huge backlog of pending > operations? I can see that the backlog would need persisting so that the > controller could change without losing track of the topics to be mutated, > and the mutations would need to be submitted in batches to the controller > event queue (thus allowing other controller requests to be interleaved). I > realise this is not feasible right now, I'm just trying to understand if > it's feasible at all and if there's any appetite for making the requisite > API changes in the future in order to prevent these problems even for large > single requests. > > Kind regards, > > Tom >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the reply. >> If I understand the proposed throttling algorithm, an initial request > would > >> be allowed (possibly making K negative) and only subsequent requests > >> (before K became positive) would receive the QUOTA_VIOLATED. That would > >> mean it was still possible to block the controller from handling other > >> events – you just need to do so via making one big request. > > That is correct. One could still create one big topic (not request) and > that would > create some load on the controller. All options suffer from this issue as > we can > stop clients from creating a very large topic. At least, when it happens, > the client > will have to wait to pay back its credits which guarantee that we control > the average > load on the controller. > I can see that the admission throttling is better than nothing. It's just that it doesn't fully solve the problem described in the KIP's motivation. That doesn't mean it's not worth doing. I certainly prefer this approach over that taken by KIP-578 (which cites the effects of deleting a single topic with many partitions). > >> While the reasons for rejecting execution throttling make sense given > the > >> RPCs we have today that seems to be at the cost of still allowing harm > to > >> the cluster, or did I misunderstand? > > Execution throttling would also suffer from large topics being created. We > have > rejected it due to the current RPCs and also because it does not limit the > amount > of work queued up in the controller. Imagine a low quota, that would result > in a huge > backlog of pending operations. > What exactly is the problem with having a huge backlog of pending operations? I can see that the backlog would need persisting so that the controller could change without losing track of the topics to be mutated, and the mutations would need to be submitted in batches to the controller event queue (thus allowing other controller requests to be interleaved). I realise this is not feasible right now, I'm just trying to understand if it's feasible at all and if there's any appetite for making the requisite API changes in the future in order to prevent these problems even for large single requests. Kind regards, Tom
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Tom, Thanks for the feedback. >> If I understand the proposed throttling algorithm, an initial request would >> be allowed (possibly making K negative) and only subsequent requests >> (before K became positive) would receive the QUOTA_VIOLATED. That would >> mean it was still possible to block the controller from handling other >> events – you just need to do so via making one big request. That is correct. One could still create one big topic (not request) and that would create some load on the controller. All options suffer from this issue as we can stop clients from creating a very large topic. At least, when it happens, the client will have to wait to pay back its credits which guarantee that we control the average load on the controller. >> While the reasons for rejecting execution throttling make sense given the >> RPCs we have today that seems to be at the cost of still allowing harm to >> the cluster, or did I misunderstand? Execution throttling would also suffer from large topics being created. We have rejected it due to the current RPCs and also because it does not limit the amount of work queued up in the controller. Imagine a low quota, that would result in a huge backlog of pending operations. Best, David On Mon, May 11, 2020 at 4:57 PM David Jacot wrote: > Hi Anna and Jun, > > Anna, thanks for your thoughtful feedback. Overall, I agree with what you > said. If I summarize, you said that using time on server threads is not > easier > to tune than a rate based approach and it does not really capture all the > load neither as the control requests are not taken into account. > > I have been thinking about using an approach similar to the request quota > and > here are my thoughts: > > 1. With the current architecture of the controller, the ZK writes > resulting from a > topic being created, expanded or deleted are spread among the api handler > threads and the controller thread. This is problematic to measure the real > thread > usage. I suppose that this will change with KIP-500 though. > > 2. I do agree with Anna that we don't only care about requests tying up > the controller > thread but also care about the control requests. A rate based approach > would allow > us to define a value which copes with both dimensions. > > 3. Doing the accounting in the controller requires to attach a principal > and a client id > to each topic as the quotas are expressed per principal and/or client id. > I find this a > little odd. These information would have to be updated whenever a topic is > expanded > or deleted and any subsequent operations for that topic in the controller > would be > accounted for the last principal and client id. As an example, imagine a > client that > deletes topics which have partitions on a dead broker. In this case, the > deletion would > be retried until it succeeds. If that client has a low quota, that may > prevent it from > doing any new operations until the delete succeeds. This is a strange > behavior. > > 4. One important aspect of the proposal is that we want to be able to > reject requests > when the quota is exhausted. With a usage based approach, it is difficult > to compute > the time to wait before being able to create the topics. The only that we > could do is > to ask the client to wait until the measure time drops to the quota to > ensure that a new > operations can be accepted. With a rate based approach, we can > precisely compute > the time to wait. > > 5. I find the experience for users slightly better with a rate based > approach. When I say > users here, I mean the users of the admin client or the cli. When > throttled, they would > get an error saying: "can't create because you have reached X partition > mutations/sec". > With the other approach, we could only say: "quota exceeded". > > 6. I do agree that the rate based approach is less generic in the long > term, especially if > other resource types are added in the controller. > > Altogether, I am not convinced by a usage based approach and I would > rather lean > towards keeping the current proposal. > > Best, > David > > On Thu, May 7, 2020 at 2:11 AM Anna Povzner wrote: > >> Hi David and Jun, >> >> I wanted to add to the discussion about using requests/sec vs. time on >> server threads (similar to request quota) for expressing quota for topic >> ops. >> >> I think request quota does not protect the brokers from overload by itself >> -- it still requires tuning and sometimes re-tuning, because it depends on >> the workload behavior of all users (like a relative share of requests >> exempt from throttling). This makes it not that easy to set. Let me give >> you more details: >> >>1. >> >>The amount of work that the user can get from the request quota depends >>on the load from other users. We measure and enforce user's clock time >> on >>threads---the time between 2 timestamps, one when the operation starts >> and >>one when the operation ends. If the user is the only load
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Anna and Jun, Anna, thanks for your thoughtful feedback. Overall, I agree with what you said. If I summarize, you said that using time on server threads is not easier to tune than a rate based approach and it does not really capture all the load neither as the control requests are not taken into account. I have been thinking about using an approach similar to the request quota and here are my thoughts: 1. With the current architecture of the controller, the ZK writes resulting from a topic being created, expanded or deleted are spread among the api handler threads and the controller thread. This is problematic to measure the real thread usage. I suppose that this will change with KIP-500 though. 2. I do agree with Anna that we don't only care about requests tying up the controller thread but also care about the control requests. A rate based approach would allow us to define a value which copes with both dimensions. 3. Doing the accounting in the controller requires to attach a principal and a client id to each topic as the quotas are expressed per principal and/or client id. I find this a little odd. These information would have to be updated whenever a topic is expanded or deleted and any subsequent operations for that topic in the controller would be accounted for the last principal and client id. As an example, imagine a client that deletes topics which have partitions on a dead broker. In this case, the deletion would be retried until it succeeds. If that client has a low quota, that may prevent it from doing any new operations until the delete succeeds. This is a strange behavior. 4. One important aspect of the proposal is that we want to be able to reject requests when the quota is exhausted. With a usage based approach, it is difficult to compute the time to wait before being able to create the topics. The only that we could do is to ask the client to wait until the measure time drops to the quota to ensure that a new operations can be accepted. With a rate based approach, we can precisely compute the time to wait. 5. I find the experience for users slightly better with a rate based approach. When I say users here, I mean the users of the admin client or the cli. When throttled, they would get an error saying: "can't create because you have reached X partition mutations/sec". With the other approach, we could only say: "quota exceeded". 6. I do agree that the rate based approach is less generic in the long term, especially if other resource types are added in the controller. Altogether, I am not convinced by a usage based approach and I would rather lean towards keeping the current proposal. Best, David On Thu, May 7, 2020 at 2:11 AM Anna Povzner wrote: > Hi David and Jun, > > I wanted to add to the discussion about using requests/sec vs. time on > server threads (similar to request quota) for expressing quota for topic > ops. > > I think request quota does not protect the brokers from overload by itself > -- it still requires tuning and sometimes re-tuning, because it depends on > the workload behavior of all users (like a relative share of requests > exempt from throttling). This makes it not that easy to set. Let me give > you more details: > >1. > >The amount of work that the user can get from the request quota depends >on the load from other users. We measure and enforce user's clock time > on >threads---the time between 2 timestamps, one when the operation starts > and >one when the operation ends. If the user is the only load on the > broker, it >is less likely that their operation will be interrupted by the kernel to >switch to another thread, and time away from the thread still counts. >1. > > Pros: this makes it more work-conserving, the user is less limited > when there are more resources available. > 2. > > Cons: Harder to capacity plan for the user, and could be confusing > when the broker will suddenly stop supporting the load which it was > supporting before. > 2. > >For the above reason, it makes most sense to maximize the user's quota >and set it as a percent of the maximum thread capacity (1100 with > default >broker config). >3. > >However, the actual maximum threads capacity is not really 1100: >1. > > Some of it will be taken by requests exempt from throttling, and the > amount depends on the workload. We have seen cases (somewhat rare) > where > requests exempt from throttling take like ⅔ of the time on threads. > 2. > > We have also seen cases of an overloaded cluster (full queues, > timeouts, etc) due to high request rate while the time used on > threads was > way below the max (1100), like 600 or 700 (total exempt + non-exempt > usage). Basically, when a broker is close to 100% CPU, it takes more > and > more time for the "unaccounted" work like thread getting a chance to > pick > up a request from the
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David and Jun, I wanted to add to the discussion about using requests/sec vs. time on server threads (similar to request quota) for expressing quota for topic ops. I think request quota does not protect the brokers from overload by itself -- it still requires tuning and sometimes re-tuning, because it depends on the workload behavior of all users (like a relative share of requests exempt from throttling). This makes it not that easy to set. Let me give you more details: 1. The amount of work that the user can get from the request quota depends on the load from other users. We measure and enforce user's clock time on threads---the time between 2 timestamps, one when the operation starts and one when the operation ends. If the user is the only load on the broker, it is less likely that their operation will be interrupted by the kernel to switch to another thread, and time away from the thread still counts. 1. Pros: this makes it more work-conserving, the user is less limited when there are more resources available. 2. Cons: Harder to capacity plan for the user, and could be confusing when the broker will suddenly stop supporting the load which it was supporting before. 2. For the above reason, it makes most sense to maximize the user's quota and set it as a percent of the maximum thread capacity (1100 with default broker config). 3. However, the actual maximum threads capacity is not really 1100: 1. Some of it will be taken by requests exempt from throttling, and the amount depends on the workload. We have seen cases (somewhat rare) where requests exempt from throttling take like ⅔ of the time on threads. 2. We have also seen cases of an overloaded cluster (full queues, timeouts, etc) due to high request rate while the time used on threads was way below the max (1100), like 600 or 700 (total exempt + non-exempt usage). Basically, when a broker is close to 100% CPU, it takes more and more time for the "unaccounted" work like thread getting a chance to pick up a request from the queue and get a timestamp. 4. As a result, there will be some tuning to decide on a safe value for total thread capacity, from where users can carve out their quotas. Some changes in users' workloads may require re-tuning, if, for example, it dramatically changes the relative share of non-exempt load. I think request quota works well for client request load in a sense that it ensures that different users get a fair/proportional share of resources during high broker load. If the user cannot get enough resources from their quota to support their request rate anymore, they can monitor their load and expand the cluster if needed (or rebalance). However, I think using time on threads for topic ops could be even more difficult than simple request rate (as proposed): 1. I understand that we don't only care about topic requests tying up the controller thread, but we also care that it does not create a large extra load on the cluster due to LeaderAndIsr and other related requests (this is more important for small clusters). 2. For that reason, tuning quota in terms of time on threads can be harder, because there is no easy way to say how this quota would translate to a number of operations (because that would depend on other broker load). Since tuning would be required anyway, I see the following workflow if we express controller quota in terms of partition mutations per second: 1. Run topic workload in isolation (the most expensive one, like create topic vs. add partitions) and see how much load it adds based on incoming rate. Choose quota depending on how much extra load your cluster can sustain in addition to its normal load. 2. Could be useful to publish some experimental results to give some ballpark numbers to make this sizing easier. I am interested to see if you agree with the listed assumptions here. I may have missed something, especially if there is an easier workflow for setting quota based on time on threads. Thanks, Anna On Thu, Apr 30, 2020 at 8:13 AM Tom Bentley wrote: > Hi David, > > Thanks for the KIP. > > If I understand the proposed throttling algorithm, an initial request would > be allowed (possibly making K negative) and only subsequent requests > (before K became positive) would receive the QUOTA_VIOLATED. That would > mean it was still possible to block the controller from handling other > events – you just need to do so via making one big request. > > While the reasons for rejecting execution throttling make sense given the > RPCs we have today that seems to be at the cost of still allowing harm to > the cluster, or did I misunderstand? > > Kind regards, > > Tom > > > > On Tue, Apr 28, 2020 at 1:49 AM Jun Rao wrote: > > > Hi, David, > > > > Thanks for the reply. A few more
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi David, Thanks for the KIP. If I understand the proposed throttling algorithm, an initial request would be allowed (possibly making K negative) and only subsequent requests (before K became positive) would receive the QUOTA_VIOLATED. That would mean it was still possible to block the controller from handling other events – you just need to do so via making one big request. While the reasons for rejecting execution throttling make sense given the RPCs we have today that seems to be at the cost of still allowing harm to the cluster, or did I misunderstand? Kind regards, Tom On Tue, Apr 28, 2020 at 1:49 AM Jun Rao wrote: > Hi, David, > > Thanks for the reply. A few more comments. > > 1. I am actually not sure if a quota based on request rate is easier for > the users. For context, in KIP-124, we started with a request rate quota, > but ended up not choosing it. The main issues are (a) requests are not > equal; some are more expensive than others; (b) the users typically don't > know how expensive each type of request is. For example, a big part of > the controller cost is ZK write. To create a new topic with 1 partition, > the number of ZK writes is 4 (1 for each segment > in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of > adding one partition to an existing topic requires 2 ZK writes. The cost of > deleting a topic with 1 partition requires 6 to 7 ZK writes. It's unlikely > for a user to know the exact cost associated with those > implementation details. If users don't know the cost, it's not clear if > they can set the rate properly. > > 2. I think that depends on the goal. To me, the common problem is that you > have many applications running on a shared Kafka cluster and one of the > applications abuses the broker by issuing too many requests. In this case, > a global quota will end up throttling every application. However, what we > really want in this case is to only throttle the application that causes > the problem. A user level quota solves this problem more effectively. We > may still need some sort of global quota when the total usage from all > applications exceeds the broker resource. But that seems to be secondary > since it's uncommon for all applications' usage to go up at the same time. > We can also solve this problem by reducing the per user quota for every > application if there is a user level quota. > > 3. Not sure that I fully understand the difference in burst balance. The > current throttling logic works as follows. Each quota is measured over a > number of time windows. Suppose the Quota is to X/sec. If time passes and > the quota is not being used, we are accumulating credit at the rate of > X/sec. If a quota is being used, we are reducing that credit based on the > usage. The credit expires when the corresponding window rolls out. The max > credit that can be accumulated is X * number of windows * window size. So, > in some sense, the current logic also supports burst and a way to cap the > burst. Could you explain the difference with Token Bucket a bit more? Also, > the current quota system always executes the current request even if it's > being throttled. It just informs the client to back off a throttled amount > of time before sending another request. > > Jun > > > > On Mon, Apr 27, 2020 at 5:15 AM David Jacot wrote: > > > Hi Jun, > > > > Thank you for the feedback. > > > > 1. You are right. At the end, we do care about the percentage of time > that > > an operation ties up the controller thread. I thought about this but I > was > > not entirely convinced by it for following reasons: > > > > 1.1. While I do agree that setting up a rate and a burst is a bit harder > > than > > allocating a percentage for the administrator of the cluster, I believe > > that a > > rate and a burst are way easier to understand for the users of the > cluster. > > > > 1.2. Measuring the time that a request ties up the controller thread is > not > > as straightforward as it sounds because the controller reacts to ZK > > TopicChange and TopicDeletion events in lieu of handling requests > directly. > > These events do not carry on the client id nor the user information so > the > > best would be to refactor the controller to accept requests instead of > > reacting > > to the events. This will be possible with KIP-590. It has obviously other > > side effects in the controller (e.g. batching). > > > > I leaned towards the current proposal mainly due to 1.1. as 1.2. can be > (or > > will be) fixed. Does 1.1. sound like a reasonable trade off to you? > > > > 2. It is not in the current proposal. I thought that a global quota would > > be > > enough to start with. We can definitely make it work like the other > quotas. > > > > 3. The main difference is that the Token Bucket algorithm defines an > > explicit > > burst B while guaranteeing an average rate R whereas our existing quota > > guarantees an average rate R as well but starts to throttle as soon as > the > > rate
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the reply. A few more comments. 1. I am actually not sure if a quota based on request rate is easier for the users. For context, in KIP-124, we started with a request rate quota, but ended up not choosing it. The main issues are (a) requests are not equal; some are more expensive than others; (b) the users typically don't know how expensive each type of request is. For example, a big part of the controller cost is ZK write. To create a new topic with 1 partition, the number of ZK writes is 4 (1 for each segment in /brokers/topics/[topic]/partitions/[partitionId]/state). The cost of adding one partition to an existing topic requires 2 ZK writes. The cost of deleting a topic with 1 partition requires 6 to 7 ZK writes. It's unlikely for a user to know the exact cost associated with those implementation details. If users don't know the cost, it's not clear if they can set the rate properly. 2. I think that depends on the goal. To me, the common problem is that you have many applications running on a shared Kafka cluster and one of the applications abuses the broker by issuing too many requests. In this case, a global quota will end up throttling every application. However, what we really want in this case is to only throttle the application that causes the problem. A user level quota solves this problem more effectively. We may still need some sort of global quota when the total usage from all applications exceeds the broker resource. But that seems to be secondary since it's uncommon for all applications' usage to go up at the same time. We can also solve this problem by reducing the per user quota for every application if there is a user level quota. 3. Not sure that I fully understand the difference in burst balance. The current throttling logic works as follows. Each quota is measured over a number of time windows. Suppose the Quota is to X/sec. If time passes and the quota is not being used, we are accumulating credit at the rate of X/sec. If a quota is being used, we are reducing that credit based on the usage. The credit expires when the corresponding window rolls out. The max credit that can be accumulated is X * number of windows * window size. So, in some sense, the current logic also supports burst and a way to cap the burst. Could you explain the difference with Token Bucket a bit more? Also, the current quota system always executes the current request even if it's being throttled. It just informs the client to back off a throttled amount of time before sending another request. Jun On Mon, Apr 27, 2020 at 5:15 AM David Jacot wrote: > Hi Jun, > > Thank you for the feedback. > > 1. You are right. At the end, we do care about the percentage of time that > an operation ties up the controller thread. I thought about this but I was > not entirely convinced by it for following reasons: > > 1.1. While I do agree that setting up a rate and a burst is a bit harder > than > allocating a percentage for the administrator of the cluster, I believe > that a > rate and a burst are way easier to understand for the users of the cluster. > > 1.2. Measuring the time that a request ties up the controller thread is not > as straightforward as it sounds because the controller reacts to ZK > TopicChange and TopicDeletion events in lieu of handling requests directly. > These events do not carry on the client id nor the user information so the > best would be to refactor the controller to accept requests instead of > reacting > to the events. This will be possible with KIP-590. It has obviously other > side effects in the controller (e.g. batching). > > I leaned towards the current proposal mainly due to 1.1. as 1.2. can be (or > will be) fixed. Does 1.1. sound like a reasonable trade off to you? > > 2. It is not in the current proposal. I thought that a global quota would > be > enough to start with. We can definitely make it work like the other quotas. > > 3. The main difference is that the Token Bucket algorithm defines an > explicit > burst B while guaranteeing an average rate R whereas our existing quota > guarantees an average rate R as well but starts to throttle as soon as the > rate goes above the defined quota. > > Creating and deleting topics is bursty by nature. Applications create or > delete > topics occasionally by usually sending one request with multiple topics. > The > reasoning behind allowing a burst is to allow such requests with a > reasonable > size to pass without being throttled whereas our current quota mechanism > would reject any topics as soon as the rate is above the quota requiring > the > applications to send subsequent requests to create or to delete all the > topics. > > Best, > David > > > On Fri, Apr 24, 2020 at 9:03 PM Jun Rao wrote: > > > Hi, David, > > > > Thanks for the KIP. A few quick comments. > > > > 1. About quota.partition.mutations.rate. I am not sure if it's very easy > > for the user to set the quota as a rate. For example, each partition > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi Jun, Thank you for the feedback. 1. You are right. At the end, we do care about the percentage of time that an operation ties up the controller thread. I thought about this but I was not entirely convinced by it for following reasons: 1.1. While I do agree that setting up a rate and a burst is a bit harder than allocating a percentage for the administrator of the cluster, I believe that a rate and a burst are way easier to understand for the users of the cluster. 1.2. Measuring the time that a request ties up the controller thread is not as straightforward as it sounds because the controller reacts to ZK TopicChange and TopicDeletion events in lieu of handling requests directly. These events do not carry on the client id nor the user information so the best would be to refactor the controller to accept requests instead of reacting to the events. This will be possible with KIP-590. It has obviously other side effects in the controller (e.g. batching). I leaned towards the current proposal mainly due to 1.1. as 1.2. can be (or will be) fixed. Does 1.1. sound like a reasonable trade off to you? 2. It is not in the current proposal. I thought that a global quota would be enough to start with. We can definitely make it work like the other quotas. 3. The main difference is that the Token Bucket algorithm defines an explicit burst B while guaranteeing an average rate R whereas our existing quota guarantees an average rate R as well but starts to throttle as soon as the rate goes above the defined quota. Creating and deleting topics is bursty by nature. Applications create or delete topics occasionally by usually sending one request with multiple topics. The reasoning behind allowing a burst is to allow such requests with a reasonable size to pass without being throttled whereas our current quota mechanism would reject any topics as soon as the rate is above the quota requiring the applications to send subsequent requests to create or to delete all the topics. Best, David On Fri, Apr 24, 2020 at 9:03 PM Jun Rao wrote: > Hi, David, > > Thanks for the KIP. A few quick comments. > > 1. About quota.partition.mutations.rate. I am not sure if it's very easy > for the user to set the quota as a rate. For example, each partition > mutation could take a different number of ZK operations (depending on > things like retry). The time to process each ZK operation may also vary > from cluster to cluster. An alternative way to model this is to do sth > similar to the request (CPU) quota, which exposes quota as a percentage of > the server threads that can be used. The current request quota doesn't > include the controller thread. We could add something that measures/exposes > the percentage of time that a request ties up the controller thread, which > seems to be what we really care about. > > 2. Is the new quota per user? Intuitively, we want to only penalize > applications that overuse the broker resources, but not others. Also, in > existing types of quotas (request, bandwidth), there is a hierarchy among > clientId vs user and default vs customized (see > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users > ). Does the new quota fit into the existing hierarchy? > > 3. It seems that you are proposing a new quota mechanism based on Token > Bucket algorithm. Could you describe its tradeoff with the existing quota > mechanism? Ideally, it would be better if we have a single quota mechanism > within Kafka. > > Jun > > > > > On Fri, Apr 24, 2020 at 9:52 AM David Jacot wrote: > > > Hi folks, > > > > I'd like to start the discussion for KIP-599: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > It proposes to introduce quotas for the create topics, create partitions > > and delete topics operations. Let me know what you think, thanks. > > > > Best, > > David > > >
Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi, David, Thanks for the KIP. A few quick comments. 1. About quota.partition.mutations.rate. I am not sure if it's very easy for the user to set the quota as a rate. For example, each partition mutation could take a different number of ZK operations (depending on things like retry). The time to process each ZK operation may also vary from cluster to cluster. An alternative way to model this is to do sth similar to the request (CPU) quota, which exposes quota as a percentage of the server threads that can be used. The current request quota doesn't include the controller thread. We could add something that measures/exposes the percentage of time that a request ties up the controller thread, which seems to be what we really care about. 2. Is the new quota per user? Intuitively, we want to only penalize applications that overuse the broker resources, but not others. Also, in existing types of quotas (request, bandwidth), there is a hierarchy among clientId vs user and default vs customized (see https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users ). Does the new quota fit into the existing hierarchy? 3. It seems that you are proposing a new quota mechanism based on Token Bucket algorithm. Could you describe its tradeoff with the existing quota mechanism? Ideally, it would be better if we have a single quota mechanism within Kafka. Jun On Fri, Apr 24, 2020 at 9:52 AM David Jacot wrote: > Hi folks, > > I'd like to start the discussion for KIP-599: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > It proposes to introduce quotas for the create topics, create partitions > and delete topics operations. Let me know what you think, thanks. > > Best, > David >
KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations
Hi folks, I'd like to start the discussion for KIP-599: https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations It proposes to introduce quotas for the create topics, create partitions and delete topics operations. Let me know what you think, thanks. Best, David