On Tue, Jun 9, 2020, at 05:06, David Jacot wrote: > Hi Colin, > > Thank you for your feedback. > > Jun has summarized the situation pretty well. Thanks Jun! I would like to > complement it with the following points: > > 1. Indeed, when the quota is exceeded, the broker will reject the topic > creations, partition creations and topics deletions that are exceeding > with the new QUOTA_VIOLATED error. The ThrottleTimeMs field will > be populated accordingly to let the client know how long it must wait. > > 2. I do agree that we actually want a mechanism to apply back pressure > to the clients. The KIP basically proposes a mechanism to control and to > limit the rate of operations before entering the controller. I think that > it is similar to your thinking but is enforced based on a defined > instead of relying on the number of pending items in the controller. > > 3. You mentioned an alternative idea in your comments that, if I understood > correctly, would bound the queue to limit the overload and reject work if > the queue is full. I have been thinking about this as well but I don't think > that it works well in our case. > - The first reason is the one mentioned by Jun. We actually want to be able > to limit specific clients (the misbehaving ones) in a multi-tenant > environment. > - The second reason is that, at least in our current implementation, the > length of the queue is not really a good characteristic to estimate the load. > Coming back to your example of the CreateTopicsRequest. They create path > in ZK for each newly created topics which trigger a ChangeTopic event in > the controller. That single event could be for a single topic in some cases or > for a thousand topics in others. > These two reasons aside, bounding the queue also introduces a knob which > requires some tuning and thus suffers from all the points you mentioned as > well, isn't it? The value may be true for one version but not for another. > > 4. Regarding the integration with KIP-500. The implementation of this KIP > will span across the ApiLayer and the AdminManager. To be honest, we > can influence the implementation to work best with what you have in mind > for the future controller. If you could shed some more light on the future > internal architecture, I can take this into account during the > implementation. >
Hi David, Good question. The approach we are thinking of is that in the future, topic creation will be a controller RPC. In other words, rather than changing ZK and waiting for the controller code to notice, we'll go through the controller code (which may change ZK, or may do something else in the ZK-free environment). I don't think there is a good way to write this in the short term that avoids having to rewrite in the long term. That's probably OK though, as long as we keep in mind that we'll need to. > > 5. Regarding KIP-590. For the create topics request, create partitions > request, and delete topics request, we are lucky enough to have directed > all of them to the controller already so we are fine with doing the admission > in the broker which hosts the controller. If we were to throttle more > operations > in the future, > I believe that we can continue to do it centrally while leveraging the > principal that is forwarded to account for the right tenant. The reason > why I would like to keep it central is that we are talking about sparse (or > bursty) > workloads here so distributing the quota among all the brokers makes little > sense. > Right. The main requirement here is that the quota must operate based on principal names rather than KafkaPrincipal objects. We had a long discussion about KIP-590 and eventually concluded that we didn't want to make KafkaPrincipal serializable (at least not yet) so what would be forwarded is just a string, not the principal object. > > 6. Regarding the naming of the new error code. BUSY sounds too generic to > me so I would rather prefer a specific error code. The main reason being > that we may be able to reuse it in the future for other quotas. That being > said, > we can find another name. QUOTA_EXHAUSTED? I don't feel too strongly about > the naming. I wonder what the others think about this. > Think about if you're someone writing software that uses the Kafka client. Let's say you try to create some topics and get back QUOTA_VIOLATED. What does it mean? Maybe it means that you tried to create too many topics in too short a time (violating the controller throttle). Or maybe it means that you exceeded your quota specifying the number of partitions that they are allowed to create (let's assume that someone did a follow-on KIP for that that reuses this error code for that.) But now you have a dilemma. If the controller was just busy, then trying again is the right thing to do. If there is some other quota you violated (number of partitions, number of topics, etc.) then retrying is wasteful and will not accomplish anything. Of course, the Kafka client software itself can tell the two cases apart since it has access to throttleTimeMs. But the end user can't (and in certain modes, the exception is exposed directly to the user here). That's why "you violated some quota, not sure which" is not a good error code. So I think we should distinguish the two cases by having two separate error codes. Maybe something like RATE_QUOTA_VIOLATED and RESOURCE_QUOTA_VIOLATED. This KIP would only need the former one. Another possibility is that the user could distinguish the two cases by the error string. But typically we want error strings to be used to give extra debugging information, not to make big decisions about what the client should do. So I think that the error code should actually be a little bit more specific, or at least tell the end user what to do with it (that's why I suggested "busy"). > > VoilĂ . I hope that I have addressed all your questions/points and I am > sorry for the lengthy email. > Thanks, David. It looks good to me overall. And I thought your email was very clear-- not too long at all. Let's just figure out the error code thing. best, Colin > > Regards, > David > > > On Tue, Jun 9, 2020 at 2:13 AM Colin McCabe <cmcc...@apache.org> wrote: > > > On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote: > > > Hi, Colin, > > > > > > Thanks for the comment. You brought up several points. > > > > > > 1. Should we set up a per user quota? To me, it does seem we need some > > sort > > > of a quota. When the controller runs out of resources, ideally, we only > > > want to penalize the bad behaving applications, instead of every > > > application. To do that, we will need to know what each application is > > > entitled to and the per user quota is intended to capture that. > > > > > > 2. How easy is it to configure a quota? The following is how an admin > > > typically sets up a quota in our existing systems. Pick a generous > > default > > > per user quota works for most applications. For the few resource > > intensive > > > applications, customize a higher quota for them. Reserve enough resources > > > in anticipation that a single (or a few) application will exceed the > > quota > > > at a given time. > > > > > > > Hi Jun, > > > > Thanks for the response. > > > > Maybe I was too pessimistic about the ability of admins to configure a > > useful quota here. I do agree that it would be nice to have the ability to > > set different quotas for different users, as you mentioned. > > > > > > > > 3. How should the quota be defined? In the discussion thread, we debated > > > between a usage based model vs a rate based model. Dave and Anna argued > > for > > > the rate based model mostly because it's simpler to implement. > > > > > > > I'm trying to think more about how this integrates with our plans for > > KIP-500. When we get rid of ZK, we will have to handle this in the > > controller itself, rather than in the AdminManager. That implies we'll > > have to rewrite the code. Maybe this is worth it if we want this feature > > now, though. > > > > Another wrinkle here is that as we discussed in KIP-590, controller > > operations will land on a random broker first, and only then be forwarded > > to the active controller. This implies that either admissions control > > should happen on all brokers (needing some kind of distributed quota > > scheme), or be done on the controller after we've already done the work of > > forwarding the message. The second approach might not be that bad, but it > > would be nice to figure this out. > > > > > > > > 4. If a quota is exceeded, how is that enforced? My understanding of the > > > KIP is that, if a quota is exceeded, the broker immediately sends back > > > a QUOTA_VIOLATED error and a throttle time back to the client, and the > > > client will wait for the throttle time before issuing the next request. > > > This seems to be the same as the BUSY error code you mentioned. > > > > > > > Yes, I agree, it sounds like we're thinking along the same lines. > > However, rather than QUOTA_VIOLATED, how about naming the error code BUSY? > > Then the error text could indicate the quota that we violated. This would > > be more generally useful as an error code and also avoid being confusingly > > similar to POLICY_VIOLATION. > > > > best, > > Colin > > > > > > > > I will let David chime in more on that. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > > > > On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > > > Hi David, > > > > > > > > Thanks for the KIP. > > > > > > > > I thought about this for a while and I actually think this approach is > > not > > > > quite right. The problem that I see here is that using an explicitly > > set > > > > quota here requires careful tuning by the cluster operator. Even > > worse, > > > > this tuning might be invalidated by changes in overall conditions or > > even > > > > more efficient controller software. > > > > > > > > For example, if we empirically find that the controller can do 1000 > > topics > > > > in a minute (or whatever), this tuning might actually be wrong if the > > next > > > > version of the software can do 2000 topics in a minute because of > > > > efficiency upgrades. Or, the broker that the controller is located on > > > > might be experiencing heavy load from its non-controller operations, > > and so > > > > it can only do 500 topics in a minute during this period. > > > > > > > > So the system administrator gets a very obscure tunable (it's not > > clear to > > > > a non-Kafka-developer what "controller mutations" are or why they > > should > > > > care). And even worse, they will have to significantly "sandbag" the > > value > > > > that they set it to, so that even under the heaviest load and oldest > > > > deployed version of the software, the controller can still function. > > Even > > > > worse, this new quota adds a lot of complexity to the controller. > > > > > > > > What we really want is backpressure when the controller is > > overloaded. I > > > > believe this is the alternative you discuss in "Rejected Alternatives" > > > > under "Throttle the Execution instead of the Admission" Your reason > > for > > > > rejecting it is that the client error handling does not work well in > > this > > > > case. But actually, this is an artifact of our current implementation, > > > > rather than a fundamental issue with backpressure. > > > > > > > > Consider the example of a CreateTopicsRequest. The controller could > > > > return a special error code if the load was too high, and take the > > create > > > > topics event off the controller queue. Let's call that error code > > BUSY. > > > > Additionally, the controller could immediately refuse new events if > > the > > > > queue had reached its maximum length, and simply return BUSY for that > > case > > > > as well. > > > > > > > > Basically, the way we handle RPC timeouts in the controller right now > > is > > > > not very good. As you know, we time out the RPC, so the client gets > > > > TimeoutException, but then keep the event on the queue, so that it > > > > eventually gets executed! There's no reason why we have to do that. > > We > > > > could take the event off the queue if there is a timeout. This would > > > > reduce load and mostly avoid the paradoxical situations you describe > > > > (getting TopicExistsException for a CreateTopicsRequest retry, etc.) > > > > > > > > I say "mostly" because there are still cases where retries could go > > astray > > > > (for example if we execute the topic creation but a network problem > > > > prevents the response from being sent to the client). But this would > > still > > > > be a very big improvement over what we have now. > > > > > > > > Sorry for commenting so late on this but I got distracted by some other > > > > things. I hope we can figure this one out-- I feel like there is a > > chance > > > > to significantly simplify this. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Fri, May 29, 2020, at 07:57, David Jacot wrote: > > > > > Hi folks, > > > > > > > > > > I'd like to start the vote for KIP-599 which proposes a new quota to > > > > > throttle create topic, create partition, and delete topics > > operations to > > > > > protect the Kafka controller: > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations > > > > > > > > > > Please, let me know what you think. > > > > > > > > > > Cheers, > > > > > David > > > > > > > > > > > > > > >