[ https://issues.apache.org/jira/browse/KAFKA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mayank Shekhar Narula updated KAFKA-16226: ------------------------------------------ Description: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. # record-queue-time-avg: increased from 20ms to 30ms. # request-latency-avg: increased from 50ms to 100ms. How it happened As can be seen from the original [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] Fix was: Background https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in java-client to skip backoff period if client knows of a newer leader, for produce-batch being retried. What changed The implementation introduced a regression noticed on a trogdor-benchmark running with high partition counts(36000!). With regression, following metrics changed on the produce side. 1. record_queue_time_avg Regression Details Fix > Java client: Performance regression in Trogdor benchmark with high partition > counts > ----------------------------------------------------------------------------------- > > Key: KAFKA-16226 > URL: https://issues.apache.org/jira/browse/KAFKA-16226 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 3.7.0, 3.6.1 > Reporter: Mayank Shekhar Narula > Assignee: Mayank Shekhar Narula > Priority: Major > Labels: kip-951 > Fix For: 3.6.2, 3.8.0, 3.7.1 > > > Background > https://issues.apache.org/jira/browse/KAFKA-15415 implemented optimisation in > java-client to skip backoff period if client knows of a newer leader, for > produce-batch being retried. > What changed > The implementation introduced a regression noticed on a trogdor-benchmark > running with high partition counts(36000!). > With regression, following metrics changed on the produce side. > # record-queue-time-avg: increased from 20ms to 30ms. > # > request-latency-avg: increased from 50ms to 100ms. > How it happened > As can be seen from the original > [PR|[http://example.com]https://github.com/apache/kafka/pull/14384]] > Fix -- This message was sent by Atlassian Jira (v8.20.10#820010)