Sean Glover created KAFKA-8814: ---------------------------------- Summary: Consumer benchmark test for paused partitions Key: KAFKA-8814 URL: https://issues.apache.org/jira/browse/KAFKA-8814 Project: Kafka Issue Type: New Feature Components: consumer, system tests, tools Reporter: Sean Glover Assignee: Sean Glover
A new performance benchmark and corresponding {{ConsumerPerformance}} tools addition to support the paused partition performance improvement implemented in KAFKA-7548. Before the fix, when the user would poll for completed fetched records for partitions that were paused, the consumer would throw away the data because it no longer fetchable. If the partition is resumed then the data would have to be fetched over again. The fix will cache completed fetched records for paused partitions indefinitely so they can be potentially be returned once the partition is resumed. In the Jira issue KAFKA-7548 there are several informal test results shown based on a number of different paused partition scenarios, but it was suggested that a test in the benchmarks testsuite would be ideal to demonstrate the performance improvement. In order to the implement this benchmark we must implement a new feature in {{ConsumerPerformance}} used by the benchmark testsuite and the {{kafka-consumer-perf-test.sh}} bin script that will pause partitions. I added the following parameter: {code:scala} val pausedPartitionsOpt = parser.accepts("paused-partitions-percent", "The percentage [0-1] of subscribed " + "partitions to pause each poll.") .withOptionalArg() .describedAs("percent") .withValuesConvertedBy(regex("^0(\\.\\d+)?|1\\.0$")) // matches [0-1] with decimals .ofType(classOf[Float]) .defaultsTo(0F) {code} This allows the user to specify a percentage (represented a floating point value from {{0..1}}) of partitions to pause each poll interval. When the value is greater than {{0}} then we will take the next _n_ partitions to pause. I ran the test on `trunk` and rebased onto the `2.3.0` tag for the following test summaries of {{kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput}}. The test will rotate through pausing {{80%}} of assigned partitions (5/6) each poll interval. I ran this on my laptop. {{trunk}} ({{aa4ba8eee8e6f52a9d80a98fb2530b5bcc1b9a11}}) {code} ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.7.5 session_id: 2019-08-18--010 run time: 2 minutes 29.104 seconds tests run: 1 passed: 1 failed: 0 ignored: 0 ================================================================================ test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8 status: PASS run time: 2 minutes 29.048 seconds {"records_per_sec": 450207.0953, "mb_per_sec": 42.9351} -------------------------------------------------------------------------------- {code} {{2.3.0}} {code} ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.7.5 session_id: 2019-08-18--011 run time: 2 minutes 41.228 seconds tests run: 1 passed: 1 failed: 0 ignored: 0 ================================================================================ test_id: kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8 status: PASS run time: 2 minutes 41.168 seconds {"records_per_sec": 246574.6024, "mb_per_sec": 23.5152} -------------------------------------------------------------------------------- {code} The increase in record and data throughput is significant. Based on other consumer fetch metrics there are also improvements to fetch rate. Depending on how often partitions are paused and resumed it's possible to save a lot of data transfer between the consumer and broker as well. Please see the pull request for the associated changes. I was unsure if I needed to create a KIP because while technically I added a new public api to the {{ConsumerPerformance}} tool, it was only to enable this benchmark to run. If you feel that a KIP is necessary I'll create one. -- This message was sent by Atlassian JIRA (v7.6.14#76016)