[ 
https://issues.apache.org/jira/browse/KAFKA-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054989#comment-16054989
 ] 

ASF GitHub Bot commented on KAFKA-5477:
---------------------------------------

GitHub user apurvam opened a pull request:

    https://github.com/apache/kafka/pull/3377

    KAFKA-5477: Lower retryBackoff for AddPartitionsRequest

    This patch lowers the retry backoff when receiving a 
CONCURRENT_TRANSACTIONS error from an AddPartitions request. The default of 
100ms would mean that back to back transactions would be 100ms long at minimum, 
making things to slow.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apurvam/kafka 
HOTFIX-lower-retry-for-add-partitions

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3377
    
----
commit 0d676688e7ed9a8d63189eb704143e62752707cc
Author: Apurva Mehta <apu...@confluent.io>
Date:   2017-06-20T00:36:28Z

    Lower retryBackoff when receiving a CONCURRENT_TRANSACTIONS error from an 
AddPartitions request

----


> TransactionalProducer sleeps unnecessarily long during back to back 
> transactions
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-5477
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5477
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>
> I am running some perf tests for EOS and there is a severe perf impact with 
> our default configs. 
> Here is the issue.
> # When we do a commit transaction, the producer sends an `EndTxn` request to 
> the coordinator. The coordinator writes the `PrepareCommit` message to the 
> transaction log and then returns the response the client. It writes the 
> transaction markers and the final 'CompleteCommit' message asynchronously. 
> # In the mean time, if the client starts another transaction, it will send an 
> `AddPartitions` request on the next `Sender.run` loop. If the markers haven't 
> been written yet, then the coordinator will return a retriable 
> `CONCURRENT_TRANSACTIONS` error to the client.
> # The current behavior in the producer is to sleep for `retryBackoffMs` 
> before retrying the request. The current default for this is 100ms. So the 
> producer will sleep for 100ms before sending the `AddPartitions` again. This 
> puts a floor on the latency for back to back transactions.
> The impact: Back to back transactions (the typical usecase for streams) would 
> have a latency floor of 100ms.
> Ideally, we don't want to sleep the full 100ms  in this particular case, 
> because the retry is 'expected'.
> The options are: 
> # do nothing, let streams override the retry.backoff.ms in their producer to 
> 10 when EOS is enabled (since they have a HOTFIX patch out anyway).
> # Introduce a special 'transactionRetryBackoffMs' non-configurable variable 
> and hard code that to a low value which applies to all transactional requests.
> # do nothing and fix it properly in 0.11.0.1 
> Option 2 as stated is a 1 line fix. If we want to lower the retry just for 
> this particular error, it would be a slightly bigger change (10-15 lines).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to