[ https://issues.apache.org/jira/browse/KAFKA-15264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jianbin.chen updated KAFKA-15264: --------------------------------- Attachment: image-2023-07-31-18-05-21-772.png > Compared with 1.1.0zk, the peak throughput of 3.5.1kraft is very jitter > ----------------------------------------------------------------------- > > Key: KAFKA-15264 > URL: https://issues.apache.org/jira/browse/KAFKA-15264 > Project: Kafka > Issue Type: Bug > Reporter: jianbin.chen > Priority: Major > Attachments: image-2023-07-28-09-51-01-662.png, > image-2023-07-28-09-52-38-941.png, image-2023-07-31-18-04-54-112.png, > image-2023-07-31-18-05-21-772.png > > > I was preparing to upgrade from 1.1.0 to 3.5.1 kraft mode (new cluster > deployment), and when I recently compared and tested, I found that when using > the following stress test command, the throughput gap is obvious > > {code:java} > ./kafka-producer-perf-test.sh --topic test321 --num-records 30000000 > --record-size 1024 --throughput -1 --producer-props > bootstrap.servers=xxx:xxxx acks=1 > 419813 records sent, 83962.6 records/sec (81.99 MB/sec), 241.1 ms avg > latency, 588.0 ms max latency. > 555300 records sent, 111015.6 records/sec (108.41 MB/sec), 275.1 ms avg > latency, 460.0 ms max latency. > 552795 records sent, 110536.9 records/sec (107.95 MB/sec), 265.9 ms avg > latency, 1120.0 ms max latency. > 552600 records sent, 110520.0 records/sec (107.93 MB/sec), 284.5 ms avg > latency, 1097.0 ms max latency. > 538500 records sent, 107656.9 records/sec (105.13 MB/sec), 277.5 ms avg > latency, 610.0 ms max latency. > 511545 records sent, 102309.0 records/sec (99.91 MB/sec), 304.1 ms avg > latency, 1892.0 ms max latency. > 511890 records sent, 102337.1 records/sec (99.94 MB/sec), 288.4 ms avg > latency, 3000.0 ms max latency. > 519165 records sent, 103812.2 records/sec (101.38 MB/sec), 262.1 ms avg > latency, 1781.0 ms max latency. > 513555 records sent, 102669.9 records/sec (100.26 MB/sec), 338.2 ms avg > latency, 2590.0 ms max latency. > 463329 records sent, 92665.8 records/sec (90.49 MB/sec), 276.8 ms avg > latency, 1463.0 ms max latency. > 494248 records sent, 98849.6 records/sec (96.53 MB/sec), 327.2 ms avg > latency, 2362.0 ms max latency. > 506272 records sent, 101254.4 records/sec (98.88 MB/sec), 322.1 ms avg > latency, 2986.0 ms max latency. > 393758 records sent, 78735.9 records/sec (76.89 MB/sec), 387.0 ms avg > latency, 2958.0 ms max latency. > 426435 records sent, 85252.9 records/sec (83.25 MB/sec), 363.3 ms avg > latency, 1959.0 ms max latency. > 412560 records sent, 82298.0 records/sec (80.37 MB/sec), 374.1 ms avg > latency, 1995.0 ms max latency. > 370137 records sent, 73997.8 records/sec (72.26 MB/sec), 396.8 ms avg > latency, 1496.0 ms max latency. > 391781 records sent, 78340.5 records/sec (76.50 MB/sec), 410.7 ms avg > latency, 2446.0 ms max latency. > 355901 records sent, 71166.0 records/sec (69.50 MB/sec), 397.5 ms avg > latency, 2715.0 ms max latency. > 385410 records sent, 77082.0 records/sec (75.28 MB/sec), 417.5 ms avg > latency, 2702.0 ms max latency. > 381160 records sent, 76232.0 records/sec (74.45 MB/sec), 407.7 ms avg > latency, 1846.0 ms max latency. > 333367 records sent, 66660.1 records/sec (65.10 MB/sec), 456.2 ms avg > latency, 1414.0 ms max latency. > 376251 records sent, 75175.0 records/sec (73.41 MB/sec), 401.9 ms avg > latency, 1897.0 ms max latency. > 354434 records sent, 70886.8 records/sec (69.23 MB/sec), 425.8 ms avg > latency, 1601.0 ms max latency. > 353795 records sent, 70744.9 records/sec (69.09 MB/sec), 411.7 ms avg > latency, 1563.0 ms max latency. > 321993 records sent, 64360.0 records/sec (62.85 MB/sec), 447.3 ms avg > latency, 1975.0 ms max latency. > 404075 records sent, 80750.4 records/sec (78.86 MB/sec), 408.4 ms avg > latency, 1753.0 ms max latency. > 384526 records sent, 76905.2 records/sec (75.10 MB/sec), 406.0 ms avg > latency, 1833.0 ms max latency. > 387652 records sent, 77483.9 records/sec (75.67 MB/sec), 397.3 ms avg > latency, 1927.0 ms max latency. > 343286 records sent, 68629.7 records/sec (67.02 MB/sec), 455.6 ms avg > latency, 1685.0 ms max latency. > 333300 records sent, 66646.7 records/sec (65.08 MB/sec), 456.6 ms avg > latency, 2146.0 ms max latency. > 361191 records sent, 72238.2 records/sec (70.55 MB/sec), 409.4 ms avg > latency, 2125.0 ms max latency. > 357525 records sent, 71490.7 records/sec (69.82 MB/sec), 436.0 ms avg > latency, 1502.0 ms max latency. > 340238 records sent, 68047.6 records/sec (66.45 MB/sec), 427.9 ms avg > latency, 1932.0 ms max latency. > 390016 records sent, 77956.4 records/sec (76.13 MB/sec), 418.5 ms avg > latency, 1807.0 ms max latency. > 352830 records sent, 70523.7 records/sec (68.87 MB/sec), 439.4 ms avg > latency, 1892.0 ms max latency. > 354526 records sent, 70905.2 records/sec (69.24 MB/sec), 429.6 ms avg > latency, 2128.0 ms max latency. > 356670 records sent, 71305.5 records/sec (69.63 MB/sec), 408.9 ms avg > latency, 1329.0 ms max latency. > 309204 records sent, 60687.7 records/sec (59.27 MB/sec), 438.6 ms avg > latency, 2566.0 ms max latency. > 366715 records sent, 72316.1 records/sec (70.62 MB/sec), 474.5 ms avg > latency, 2169.0 ms max latency. > 375174 records sent, 75034.8 records/sec (73.28 MB/sec), 429.9 ms avg > latency, 1722.0 ms max latency. > 359400 records sent, 70346.4 records/sec (68.70 MB/sec), 432.1 ms avg > latency, 1961.0 ms max latency. > 312276 records sent, 62430.2 records/sec (60.97 MB/sec), 477.4 ms avg > latency, 2006.0 ms max latency. > 361875 records sent, 72360.5 records/sec (70.66 MB/sec), 441.2 ms avg > latency, 1618.0 ms max latency. > 342449 records sent, 68462.4 records/sec (66.86 MB/sec), 446.7 ms avg > latency, 2233.0 ms max latency. > 338163 records sent, 67619.1 records/sec (66.03 MB/sec), 454.4 ms avg > latency, 1839.0 ms max latency. > 369139 records sent, 73798.3 records/sec (72.07 MB/sec), 388.3 ms avg > latency, 1753.0 ms max latency. > 362476 records sent, 72495.2 records/sec (70.80 MB/sec), 438.4 ms avg > latency, 2037.0 ms max latency. > 321426 records sent, 62267.7 records/sec (60.81 MB/sec), 475.5 ms avg > latency, 2059.0 ms max latency. > 389137 records sent, 77286.4 records/sec (75.47 MB/sec), 359.7 ms avg > latency, 1547.0 ms max latency. > 298050 records sent, 59586.2 records/sec (58.19 MB/sec), 563.9 ms avg > latency, 2761.0 ms max latency. > 325530 records sent, 65028.0 records/sec (63.50 MB/sec), 503.3 ms avg > latency, 2950.0 ms max latency. > 347306 records sent, 69419.5 records/sec (67.79 MB/sec), 404.0 ms avg > latency, 2095.0 ms max latency. > 361035 records sent, 72192.6 records/sec (70.50 MB/sec), 429.5 ms avg > latency, 1698.0 ms max latency. > 334539 records sent, 66907.8 records/sec (65.34 MB/sec), 461.1 ms avg > latency, 1731.0 ms max latency. > 367423 records sent, 73455.2 records/sec (71.73 MB/sec), 433.1 ms avg > latency, 2089.0 ms max latency. > 350940 records sent, 68947.0 records/sec (67.33 MB/sec), 434.8 ms avg > latency, 1317.0 ms max latency. > 351653 records sent, 70316.5 records/sec (68.67 MB/sec), 452.0 ms avg > latency, 2948.0 ms max latency. > 298410 records sent, 58834.8 records/sec (57.46 MB/sec), 479.2 ms avg > latency, 2279.0 ms max latency. > 351750 records sent, 70350.0 records/sec (68.70 MB/sec), 460.2 ms avg > latency, 2496.0 ms max latency. > 355367 records sent, 71073.4 records/sec (69.41 MB/sec), 416.3 ms avg > latency, 2120.0 ms max latency. > 238517 records sent, 47693.9 records/sec (46.58 MB/sec), 678.9 ms avg > latency, 3072.0 ms max latency. > 362347 records sent, 72469.4 records/sec (70.77 MB/sec), 423.8 ms avg > latency, 1714.0 ms max latency. > 308901 records sent, 61767.8 records/sec (60.32 MB/sec), 490.7 ms avg > latency, 2339.0 ms max latency. > 338280 records sent, 66919.9 records/sec (65.35 MB/sec), 422.8 ms avg > latency, 1882.0 ms max latency. > 311888 records sent, 61894.8 records/sec (60.44 MB/sec), 516.1 ms avg > latency, 3857.0 ms max latency. > 319164 records sent, 63832.8 records/sec (62.34 MB/sec), 494.3 ms avg > latency, 2250.0 ms max latency. > 291160 records sent, 58197.1 records/sec (56.83 MB/sec), 468.7 ms avg > latency, 2250.0 ms max latency. > 297599 records sent, 55834.7 records/sec (54.53 MB/sec), 472.1 ms avg > latency, 3019.0 ms max latency. > 314198 records sent, 62814.5 records/sec (61.34 MB/sec), 600.0 ms avg > latency, 2863.0 ms max latency. > 332534 records sent, 66440.4 records/sec (64.88 MB/sec), 479.2 ms avg > latency, 3337.0 ms max latency. > 320974 records sent, 64194.8 records/sec (62.69 MB/sec), 470.8 ms avg > latency, 2644.0 ms max latency. > 364638 records sent, 72825.6 records/sec (71.12 MB/sec), 408.4 ms avg > latency, 2095.0 ms max latency. > 350255 records sent, 70037.0 records/sec (68.40 MB/sec), 422.9 ms avg > latency, 3059.0 ms max latency. > 342961 records sent, 68592.2 records/sec (66.98 MB/sec), 461.5 ms avg > latency, 1779.0 ms max latency. > 348809 records sent, 69733.9 records/sec (68.10 MB/sec), 454.7 ms avg > latency, 2621.0 ms max latency. > 345438 records sent, 69032.4 records/sec (67.41 MB/sec), 439.0 ms avg > latency, 2662.0 ms max latency. > 306454 records sent, 61192.9 records/sec (59.76 MB/sec), 504.6 ms avg > latency, 2513.0 ms max latency. > 300053 records sent, 59843.0 records/sec (58.44 MB/sec), 415.6 ms avg > latency, 1655.0 ms max latency. > 332067 records sent, 66413.4 records/sec (64.86 MB/sec), 527.9 ms avg > latency, 2409.0 ms max latency. > 312132 records sent, 62426.4 records/sec (60.96 MB/sec), 463.3 ms avg > latency, 2042.0 ms max latency. > 30000000 records sent, 73963.402908 records/sec (72.23 MB/sec), 410.86 ms avg > latency, 3857.00 ms max latency, 264 ms 50th, 1259 ms 95th, 2102 ms 99th, > 2955 ms 99.9th. > {code} > !image-2023-07-28-09-51-01-662.png|width=596,height=205! > And on the 1.1.0 test, I guarantee that the command is the same, it can be > said that the stress test on 1.1.0 is basically jitter-free, I have tested > many times, and the result is still the same > {code:java} > 30000000 records sent, 108280.576630 records/sec (105.74 MB/sec), 279.05 ms > avg latency, 1426.00 ms max latency, 185 ms 50th, 646 ms 95th, 758 ms 99th, > 865 ms 99.9th.{code} > !image-2023-07-28-09-52-38-941.png|width=596,height=204! > I haven't used the 3.5.1+ZK deployment method test, I will complete this > piece of test content as soon as possible, but surprisingly, the throughput > jitter under Kraft under extreme stress testing is obvious, the topic > partitions are 30, no obvious jitter traces are found on the CPU and GC, and > the 3.5.1 client to 3.5.1 broker, > 1.1.0 client to 1.1.0 broker > 4c8g*3 > 1.1.0 config > {code:java} > #### > log.cleanup.policy=delete > log.cleaner.enable=true > log.cleaner.delete.retention.ms=300000 > listeners=PLAINTEXT://:9092 > broker.id=1 > num.network.threads=5 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > message.max.bytes=5242880 > replica.fetch.max.bytes=5242880 > log.dirs=/data01/kafka110-logs > num.partitions=3 > default.replication.factor=2 > delete.topic.enable=true > auto.create.topics.enable=true > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=2 > transaction.state.log.replication.factor=2 > transaction.state.log.min.isr=1 > offsets.retention.minutes=1440 > log.retention.minutes=30 > log.segment.bytes=104857600 > log.retention.check.interval.ms=300000 > zookeeper.connect=/kafka110-test2 > zookeeper.connection.timeout.ms=6000 > group.initial.rebalance.delay.ms=2000 > num.replica.fetchers=1{code} > 3.5.1 conf > > {code:java} > #### > listeners=PLAINTEXT://:9092,CONTROLLER://:9093 > # Name of listener used for communication between brokers. > inter.broker.listener.name=PLAINTEXT > # Listener name, hostname and port the broker will advertise to clients. > # If not set, it uses the value for "listeners". > advertised.listeners=PLAINTEXT://10.58.16.231:9092 > # A comma-separated list of the names of the listeners used by the controller. > # If no explicit mapping set in `listener.security.protocol.map`, default > will b > e using PLAINTEXT protocol > # This is required if running in KRaft mode. > controller.listener.names=CONTROLLER > process.roles=broker,controller > broker.id=1 > num.network.threads=5 > num.io.threads=8 > socket.send.buffer.bytes=102400 > socket.receive.buffer.bytes=102400 > socket.request.max.bytes=104857600 > message.max.bytes=52428800 > replica.fetch.max.bytes=52428800 > log.dirs=/data01/kafka-logs-351 > node.id=1 > controller.quorum.voters=1@:9093,2@:9093,3@:9093 > num.partitions=3 > default.replication.factor=2 > delete.topic.enable=true > auto.create.topics.enable=false > num.recovery.threads.per.data.dir=1 > offsets.topic.replication.factor=3 > transaction.state.log.replication.factor=3 > transaction.state.log.min.isr=1 > offsets.retention.minutes=4320 > log.retention.hours=72 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=300000 > num.replica.fetchers=1{code} > One thing to note is that the bandwidth of all 3 brokers is basically maxed > out, the NIC I'm using is a Gigabit NIC, and when I'm using a fixed send per > second of 20,960 and 20MB of traffic per second, there is no jitter! -- This message was sent by Atlassian Jira (v8.20.10#820010)