novosibman opened a new pull request, #13768: URL: https://github.com/apache/kafka/pull/13768
Related issue https://issues.apache.org/jira/browse/KAFKA-9693 The issue with repeating latency spikes during Kafka log segments rolling still reproduced on the latest versions including kafka_2.13-3.4.0. It was found that flushing Kafka snapshot file during segments rolling blocks producer request handling thread for some time: https://github.com/apache/kafka/blob/3.4/core/src/main/scala/kafka/log/ProducerStateManager.scala#L452 ``` private def writeSnapshot(file: File, entries: mutable.Map[Long, ProducerStateEntry]): Unit = { ... val fileChannel = FileChannel.open(file.toPath, StandardOpenOption.CREATE, StandardOpenOption.WRITE) try { fileChannel.write(buffer) fileChannel.force(true) <- here } finally { fileChannel.close() }... ``` More partitions - more cumulative latency effect observed. Suggested fix offloads flush (fileChannel.force) operation to the background thread similar to (but not exactly) how it was done in the UnifiedLog.scala: ``` def roll( ... // Schedule an asynchronous flush of the old segment scheduler.schedule("flush-log", () => flushUptoOffsetExclusive(newSegment.baseOffset)) } ``` The benchmarking using this fix shows significant reduction in repeating latency spikes: test config: AWS 3 node cluster (i3en.2xlarge) zulu11.62.17-ca-jdk11.0.18-linux_x64, heap 6G per broker 1 loadgen (m5n.8xlarge) - OpenMessaging benchmark ([OMB](https://github.com/openmessaging/benchmark)) 1 zookeeper (t2.small) acks=all batchSize=1048510 consumers=4 insyncReplicas=2 lingerMs=1 mlen=1024 producers=4 rf=3 subscriptions=1 targetRate=200k time=12m topics=1 warmup=1m ### variation 1: partitions=10 <span ng-repeat="item in report" ng-show="!item.visible || item.visible()" ng-class="item.cls + (item.sticky ? 'sticky' : '')" class="ng-scope"><span ng-if="item.table && item.displayTable.show" ng-class="item.cls" class="ng-scope"> metric | kafka_2.13-3.4.0 | kafka_2.13-3.4.0 patched -- | -- | -- endToEnd service_time (ms) p50 max | 2.00 | 2.00 endToEnd service_time (ms) p75 max | 3.00 | 2.00 endToEnd service_time (ms) p95 max | 94.0 | 3.00 endToEnd service_time (ms) p99 max | 290 | 6.00 endToEnd service_time (ms) p99.9 max | 355 | 21.0 endToEnd service_time (ms) p99.99 max | 372 | 34.0 endToEnd service_time (ms) p100 max | 374 | 36.0 publish service_time (ms) p50 max | 1.70 | 1.67 publish service_time (ms) p75 max | 2.23 | 2.09 publish service_time (ms) p95 max | 90.7 | 2.82 publish service_time (ms) p99 max | 287 | 4.69 publish service_time (ms) p99.9 max | 353 | 19.6 publish service_time (ms) p99.99 max | 369 | 31.3 publish service_time (ms) p100 max | 371 | 33.5 kafka | endToEnd chart -- | -- kafka_2.13-3.4.0 | ![image](https://github.com/apache/kafka/assets/6793713/ec329711-47d4-459f-92d7-06310b770023) kafka_2.13-3.4.0 patched | ![image](https://github.com/apache/kafka/assets/6793713/e5aeb6a6-d33a-4d57-be80-c916fa1f05be) latency score improved up to 10x times in high percentiles ^^^, spikes almost invisible ### variation 2: partitions=100 metric | kafka_2.13-3.4.0 | kafka_2.13-3.4.0 patched -- | -- | -- endToEnd service_time (ms) p50 max | 91.0 | 2.00 endToEnd service_time (ms) p75 max | 358 | 3.00 endToEnd service_time (ms) p95 max | 1814 | 4.00 endToEnd service_time (ms) p99 max | 2777 | 21.0 endToEnd service_time (ms) p99.9 max | 3643 | 119 endToEnd service_time (ms) p99.99 max | 3724 | 141 endToEnd service_time (ms) p100 max | 3726 | 143 publish service_time (ms) p50 max | 77.4 | 1.92 publish service_time (ms) p75 max | 352 | 2.35 publish service_time (ms) p95 max | 1748 | 3.80 publish service_time (ms) p99 max | 2740 | 18.9 publish service_time (ms) p99.9 max | 3619 | 116 publish service_time (ms) p99.99 max | 3720 | 139 publish service_time (ms) p100 max | 3722 | 141 endToEnd service_time kafka | endToEnd chart -- | -- kafka_2.13-3.4.0 | ![image](https://github.com/apache/kafka/assets/6793713/dc6e9820-c3b7-4bd0-8dac-bce9f3886d91) kafka_2.13-3.4.0 patched | ![image](https://github.com/apache/kafka/assets/6793713/113b4480-97a4-4dd5-8d5c-f7dc87a3d7a5) latency score improved up to 25x times in high percentiles ^^^ The fix was done for 3.4 branch - scala version of ProducerStateManager. Trunk needs corresponding fix for ProducerStateManager.java. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org