Sean Glover created KAFKA-8814:
----------------------------------

             Summary: Consumer benchmark test for paused partitions
                 Key: KAFKA-8814
                 URL: https://issues.apache.org/jira/browse/KAFKA-8814
             Project: Kafka
          Issue Type: New Feature
          Components: consumer, system tests, tools
            Reporter: Sean Glover
            Assignee: Sean Glover


A new performance benchmark and corresponding {{ConsumerPerformance}} tools 
addition to support the paused partition performance improvement implemented in 
KAFKA-7548.  Before the fix, when the user would poll for completed fetched 
records for partitions that were paused, the consumer would throw away the data 
because it no longer fetchable.  If the partition is resumed then the data 
would have to be fetched over again.  The fix will cache completed fetched 
records for paused partitions indefinitely so they can be potentially be 
returned once the partition is resumed.

In the Jira issue KAFKA-7548 there are several informal test results shown 
based on a number of different paused partition scenarios, but it was suggested 
that a test in the benchmarks testsuite would be ideal to demonstrate the 
performance improvement.  In order to the implement this benchmark we must 
implement a new feature in {{ConsumerPerformance}} used by the benchmark 
testsuite and the {{kafka-consumer-perf-test.sh}} bin script that will pause 
partitions.  I added the following parameter:

{code:scala}
    val pausedPartitionsOpt = parser.accepts("paused-partitions-percent", "The 
percentage [0-1] of subscribed " +
      "partitions to pause each poll.")
        .withOptionalArg()
        .describedAs("percent")
        .withValuesConvertedBy(regex("^0(\\.\\d+)?|1\\.0$")) // matches [0-1] 
with decimals
        .ofType(classOf[Float])
        .defaultsTo(0F)
{code}

This allows the user to specify a percentage (represented a floating point 
value from {{0..1}}) of partitions to pause each poll interval.  When the value 
is greater than {{0}} then we will take the next _n_ partitions to pause.  I 
ran the test on `trunk` and rebased onto the `2.3.0` tag for the following test 
summaries of 
{{kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput}}.
  The test will rotate through pausing {{80%}} of assigned partitions (5/6) 
each poll interval.  I ran this on my laptop.

{{trunk}} ({{aa4ba8eee8e6f52a9d80a98fb2530b5bcc1b9a11}})

{code}
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id:       2019-08-18--010
run time:         2 minutes 29.104 seconds
tests run:        1
passed:           1
failed:           0
ignored:          0
================================================================================
test_id:    
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status:     PASS
run time:   2 minutes 29.048 seconds
{"records_per_sec": 450207.0953, "mb_per_sec": 42.9351}
--------------------------------------------------------------------------------
{code}

{{2.3.0}}

{code}
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id:       2019-08-18--011
run time:         2 minutes 41.228 seconds
tests run:        1
passed:           1
failed:           0
ignored:          0
================================================================================
test_id:    
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status:     PASS
run time:   2 minutes 41.168 seconds
{"records_per_sec": 246574.6024, "mb_per_sec": 23.5152}
--------------------------------------------------------------------------------
{code}

The increase in record and data throughput is significant.  Based on other 
consumer fetch metrics there are also improvements to fetch rate.  Depending on 
how often partitions are paused and resumed it's possible to save a lot of data 
transfer between the consumer and broker as well.

Please see the pull request for the associated changes.  I was unsure if I 
needed to create a KIP because while technically I added a new public api to 
the {{ConsumerPerformance}} tool, it was only to enable this benchmark to run.  
If you feel that a KIP is necessary I'll create one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to