[ 
https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Shekhar Narula updated KAFKA-16226:
------------------------------------------
    Description: 
Background
https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in 
java-client to skip backoff period if client knows of a newer leader, for 
produce-batch being retried.

What changed
The implementation introduced a regression noticed on a trogdor-benchmark 
running with high partition counts(36000!).
With regression, following metrics changed on the produce side.
 # record-queue-time-avg: increased from 20ms to 30ms.
 # 
request-latency-avg: increased from 50ms to 100ms.

How it happened

As can be seen from the original 
[PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] 

Fix

  was:
Background
https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in 
java-client to skip backoff period if client knows of a newer leader, for 
produce-batch being retried.

What changed
The implementation introduced a regression noticed on a trogdor-benchmark 
running with high partition counts(36000!).
With regression, following metrics changed on the produce side.

1. record_queue_time_avg

Regression Details


Fix


> Java client: Performance regression in Trogdor benchmark with high partition 
> counts
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-16226
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16226
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 3.7.0, 3.6.1
>            Reporter: Mayank Shekhar Narula
>            Assignee: Mayank Shekhar Narula
>            Priority: Major
>              Labels: kip-951
>             Fix For: 3.6.2, 3.8.0, 3.7.1
>
>
> Background
> https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in 
> java-client to skip backoff period if client knows of a newer leader, for 
> produce-batch being retried.
> What changed
> The implementation introduced a regression noticed on a trogdor-benchmark 
> running with high partition counts(36000!).
> With regression, following metrics changed on the produce side.
>  # record-queue-time-avg: increased from 20ms to 30ms.
>  # 
> request-latency-avg: increased from 50ms to 100ms.
> How it happened
> As can be seen from the original 
> [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] 
> Fix



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to