In Jay's approach, a client will simply experience a delay in receiving a 
response. The primary benefit is that there are no concerns regarding data-loss 
because the data has already been appended. Retries are also a non-issue since 
there is no need for them. However, the drawback to append and delay is that if 
the socket timeout is reached (30 second default I believe), the client can 
disconnect and try to resend the batch to the server. This will cause data 
duplication since the server cannot distinguish duplicate batches. However, it 
is very likely that the maximum quota delay will be lower than the socket 
timeout unless someone explicitly overrides it. We can make this even more 
unlikely by having a fixed lower bound on the socket timeout (10 seconds?). In 
this approach we must also ignore the request timeout since a small timeout 
will completely bypass quotas.

In the other approach, assuming the client only retries a fixed number of 
times, it will eventually experience data loss since the producer will drop the 
batch at some point. IMO, it is more likely that we will see this issue in 
production than the other issues identified above.

I agree with Jay that we can delay the request longer than the request timeout 
since it isn't possible to enforce perfectly on the server anyway. I think that 
we should have a maximum delay config on the server that provides a ceiling on 
the most time we can delay a request and have it be lower than the socket 
timeout. 

Initially, I preferred delay and error because it seems like the most natural 
way to handle quota violations.. but I'm starting to see the merit in Jay's 
approach. Practically speaking, it reduces the number of moving parts in 
delivering quotas for Kafka. All changes are localized to the broker and is 
compatible with existing clients. Client changes will be required only if we 
return quota metadata in the responses or add a quota metadata API.
If we discover in production that this isn't working for some reason.. we can 
always revisit this approach of returning errors and having the clients handle 
them.

Note that both these data loss/duplicate issues only affect the producer. 
Consumers should be fine regardless of the approach we choose.

Aditya
________________________________________
From: Jun Rao [j...@confluent.io]
Sent: Monday, March 16, 2015 4:27 PM
To: dev@kafka.apache.org
Subject: Re: [KIP-DISCUSSION] KIP-13 Quotas

It's probably useful for a client to know whether its requests are
throttled or not (e.g., for monitoring and alerting). From that
perspective, option B (delay the requests and return an error) seems better.

Thanks,

Jun

On Wed, Mar 4, 2015 at 3:51 PM, Aditya Auradkar <
aaurad...@linkedin.com.invalid> wrote:

> Posted a KIP for quotas in kafka.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
>
> Appreciate any feedback.
>
> Aditya
>

Reply via email to