Hi David, Thanks for the KIP.
I thought about this for a while and I actually think this approach is not quite right. The problem that I see here is that using an explicitly set quota here requires careful tuning by the cluster operator. Even worse, this tuning might be invalidated by changes in overall conditions or even more efficient controller software. For example, if we empirically find that the controller can do 1000 topics in a minute (or whatever), this tuning might actually be wrong if the next version of the software can do 2000 topics in a minute because of efficiency upgrades. Or, the broker that the controller is located on might be experiencing heavy load from its non-controller operations, and so it can only do 500 topics in a minute during this period. So the system administrator gets a very obscure tunable (it's not clear to a non-Kafka-developer what "controller mutations" are or why they should care). And even worse, they will have to significantly "sandbag" the value that they set it to, so that even under the heaviest load and oldest deployed version of the software, the controller can still function. Even worse, this new quota adds a lot of complexity to the controller. What we really want is backpressure when the controller is overloaded. I believe this is the alternative you discuss in "Rejected Alternatives" under "Throttle the Execution instead of the Admission" Your reason for rejecting it is that the client error handling does not work well in this case. But actually, this is an artifact of our current implementation, rather than a fundamental issue with backpressure. Consider the example of a CreateTopicsRequest. The controller could return a special error code if the load was too high, and take the create topics event off the controller queue. Let's call that error code BUSY. Additionally, the controller could immediately refuse new events if the queue had reached its maximum length, and simply return BUSY for that case as well. Basically, the way we handle RPC timeouts in the controller right now is not very good. As you know, we time out the RPC, so the client gets TimeoutException, but then keep the event on the queue, so that it eventually gets executed! There's no reason why we have to do that. We could take the event off the queue if there is a timeout. This would reduce load and mostly avoid the paradoxical situations you describe (getting TopicExistsException for a CreateTopicsRequest retry, etc.) I say "mostly" because there are still cases where retries could go astray (for example if we execute the topic creation but a network problem prevents the response from being sent to the client). But this would still be a very big improvement over what we have now. Sorry for commenting so late on this but I got distracted by some other things. I hope we can figure this one out-- I feel like there is a chance to significantly simplify this. best, Colin On Fri, May 29, 2020, at 07:57, David Jacot wrote: > Hi folks, > > I'd like to start the vote for KIP-599 which proposes a new quota to > throttle create topic, create partition, and delete topics operations to > protect the Kafka controller: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > Please, let me know what you think. > > Cheers, > David >
