[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version
novosibman commented on PR #13782: URL: https://github.com/apache/kafka/pull/13782#issuecomment-1569655316 > Many thanks for the patch and the collected data! Really interesting to see the impact of this change. A few questions: > > * What storage device and file system are used in the test? In AWS config used: i3en.2xlarge with 2 x 2500 NVMe SSDs In local lab config: 2 x Samsung_SSD_860_EVO_1TB FS type: xfs The FS format had huge impact on results. Initially we used ext4 in our lab for regular testing: some of `ext4` example results: ![image](https://github.com/apache/kafka/assets/6793713/3fcbec41-9f91-4ee9-9a0c-0732524aad3b) after switched to `xfs`: ![image](https://github.com/apache/kafka/assets/6793713/1324d042-2664-4737-af48-cd4a723c914d) `ext4` was much worse before and during Kafka logs rolling > > * Would you have a real-life workload where the impact of this change can be quantified? The workload generated by the producer-perf-test.sh exhibits the problem the most because the segments of all replicas on the brokers start rolling at the same time. Which is why it is also interesting to assess the impact using topic-partitions which have different ingress rate and/or use segments of different sizes. We have no any real-life workload scenarios available for Kafka perf testing. Alternative workload https://github.com/AzulSystems/kafka-benchmark has slightly different rolling behavior compared to OMB: OMB results example on released kafka_2.13-3.4.0 version (using xfs): ![image](https://github.com/apache/kafka/assets/6793713/9b8bf37b-7067-44e7-9e18-f28089af0266) Kafka Tussle benchmark: ![image](https://github.com/apache/kafka/assets/6793713/2b3790df-acf5-4990-9736-56a7eb77e7b8) # same params used: acks=1 batchSize=1048510 consumers=4 lingerMs=1 mlen=1024 partitions=100 producers=4 rf=1 targetRate=200k time=30m topics=1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version
novosibman commented on PR #13782: URL: https://github.com/apache/kafka/pull/13782#issuecomment-1572027303 > Are all the graphs shared for OMB and Kafka Tussle generated for Kafka with the fix in this PR? Graphs with the fix noted in first description comment - marked with `kafka_2.13-3.6.0-snapshot-fix` label. Other graphs in latter comment are examples of how rolling affects results on different configurations and benchmarks using regular Kafka release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version
novosibman commented on PR #13782: URL: https://github.com/apache/kafka/pull/13782#issuecomment-1572620634 Provided updated change: returned original try-with-resource on writing, added utility method for flushing: ``` try (FileChannel fileChannel = FileChannel.open(file.toPath(), StandardOpenOption.CREATE, StandardOpenOption.WRITE)) { fileChannel.write(buffer); } if (scheduler != null) { scheduler.scheduleOnce("flush-producer-snapshot", () -> Utils.flushFileQuietly(file.toPath(), "producer-snapshot")); } else { Utils.flushFileQuietly(file.toPath(), "producer-snapshot"); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version
novosibman commented on PR #13782: URL: https://github.com/apache/kafka/pull/13782#issuecomment-1583094847 Open/close changes provided. Also corrected style check issue (in task ':storage:checkstyleMain'). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org