[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100946#comment-16100946
 ] 

Apurva Mehta commented on KAFKA-5621:
-------------------------------------

I think there are a few points here: 

1. What does 'accumulator time' really mean to an application developer? The 
accumulator is an internal detail, and we should not have application 
developers thinking about this. They only need to think about batching, and 
thus about the {{batch.size}} and {{linger.ms}} parameters. From this point of 
view, having one {{request.timeout.ms}} which applies to the accumulator queue 
time as well as the actual request time makes sense: From the developers point 
of view, the request begins as soon as the {{producer.send}} is called and 
completes once a response is received.
2. I think the main reason to introduce batch expiry was to free up memory on 
the client. But we already have a configurable bound on memory, so this reason 
is less strong: you will never explode the memory utilization on the client 
just because you lose the brokers. Instead, the {{producer.send}} call will 
just block when there is no available memory. So the benefits of expiring 
batches proactively is a bit dubious. 
3. Finally, why is it desirable to 'send a failure notification' as soon as 
possible? The simplest programming model from the developers point of view 
would be for the Kafka client to retry in as many failure scenarios as possible 
and only return a hard error when there is no way to proceed. This places less 
of a burden on the application developer. From this point of view, retrying 
when we hit a timeout in the queue makes sense: it is a retriable error. 

Stepping back a bit further, the reason we had a retries configuration in the 
first place was because producer retries could introduce duplicates. And so we 
needed to provide developers with an option to control that on their end. With 
the recent idempotent producer and ongoing work to improve the performance of 
that feature, we aim to have the idempotent producer turned on by default with 
infinite retries. This would mean that the application developer would only 
have to worry about a very small class of errors (basically only authorization 
or configuration exceptions) and yet enjoy strong semantics without any loss of 
performance.

So in the ideal future default situation, we would never expire a batch, but 
let it sit in the queue until the cluster recovers to the point when the batch 
can be sent. With this in mind, it makes sense to retry the batch on expiry 
right now. Hence the current proposal. 

> The producer should retry expired batches when retries are enabled
> ------------------------------------------------------------------
>
>                 Key: KAFKA-5621
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5621
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>             Fix For: 1.0.0
>
>
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is 
> raised to the user.
> It might be better the producer to retry the expired batch rather up to the 
> configured number of retries. This is more intuitive from the user's point of 
> view. 
> Further the proposed behavior makes it easier for applications like mirror 
> maker to provide ordering guarantees even when batches expire. Today, they 
> would resend the expired batch and it would get added to the back of the 
> queue, causing the output ordering to be different from the input ordering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to