In Jay's approach, a client will simply experience a delay in receiving a response. The primary benefit is that there are no concerns regarding data-loss because the data has already been appended. Retries are also a non-issue since there is no need for them. However, the drawback to append and delay is that if the socket timeout is reached (30 second default I believe), the client can disconnect and try to resend the batch to the server. This will cause data duplication since the server cannot distinguish duplicate batches. However, it is very likely that the maximum quota delay will be lower than the socket timeout unless someone explicitly overrides it. We can make this even more unlikely by having a fixed lower bound on the socket timeout (10 seconds?). In this approach we must also ignore the request timeout since a small timeout will completely bypass quotas.
In the other approach, assuming the client only retries a fixed number of times, it will eventually experience data loss since the producer will drop the batch at some point. IMO, it is more likely that we will see this issue in production than the other issues identified above. I agree with Jay that we can delay the request longer than the request timeout since it isn't possible to enforce perfectly on the server anyway. I think that we should have a maximum delay config on the server that provides a ceiling on the most time we can delay a request and have it be lower than the socket timeout. Initially, I preferred delay and error because it seems like the most natural way to handle quota violations.. but I'm starting to see the merit in Jay's approach. Practically speaking, it reduces the number of moving parts in delivering quotas for Kafka. All changes are localized to the broker and is compatible with existing clients. Client changes will be required only if we return quota metadata in the responses or add a quota metadata API. If we discover in production that this isn't working for some reason.. we can always revisit this approach of returning errors and having the clients handle them. Note that both these data loss/duplicate issues only affect the producer. Consumers should be fine regardless of the approach we choose. Aditya ________________________________________ From: Jun Rao [j...@confluent.io] Sent: Monday, March 16, 2015 4:27 PM To: dev@kafka.apache.org Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas It's probably useful for a client to know whether its requests are throttled or not (e.g., for monitoring and alerting). From that perspective, option B (delay the requests and return an error) seems better. Thanks, Jun On Wed, Mar 4, 2015 at 3:51 PM, Aditya Auradkar < aaurad...@linkedin.com.invalid> wrote: > Posted a KIP for quotas in kafka. > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas > > Appreciate any feedback. > > Aditya >