[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version

2023-05-31 Thread via GitHub


novosibman commented on PR #13782:
URL: https://github.com/apache/kafka/pull/13782#issuecomment-1569655316

   > Many thanks for the patch and the collected data! Really interesting to 
see the impact of this change. A few questions:
   > 
   > * What storage device and file system are used in the test?
   
   In AWS config used: i3en.2xlarge with 2 x 2500 NVMe SSDs
   In local lab config: 2 x Samsung_SSD_860_EVO_1TB
   FS type: xfs
   
   The FS format had huge impact on results. Initially we used ext4 in our lab 
for regular testing:
   some of `ext4` example results:
   
![image](https://github.com/apache/kafka/assets/6793713/3fcbec41-9f91-4ee9-9a0c-0732524aad3b)
   after switched to `xfs`:
   
![image](https://github.com/apache/kafka/assets/6793713/1324d042-2664-4737-af48-cd4a723c914d)
   `ext4`  was much worse before and during Kafka logs rolling
   
   > 
   > * Would you have a real-life workload where the impact of this change 
can be quantified? The workload generated by the producer-perf-test.sh exhibits 
the problem the most because the segments of all replicas on the brokers start 
rolling at the same time. Which is why it is also interesting to assess the 
impact using topic-partitions which have different ingress rate and/or use 
segments of different sizes.
   
   We have no any real-life workload scenarios available for Kafka perf 
testing. Alternative workload https://github.com/AzulSystems/kafka-benchmark 
has slightly different rolling behavior compared to OMB:
   
   OMB results example on released kafka_2.13-3.4.0 version (using xfs):
   
![image](https://github.com/apache/kafka/assets/6793713/9b8bf37b-7067-44e7-9e18-f28089af0266)
   
   Kafka Tussle benchmark:
   
![image](https://github.com/apache/kafka/assets/6793713/2b3790df-acf5-4990-9736-56a7eb77e7b8)
   
   # same params used:  acks=1 batchSize=1048510 consumers=4 lingerMs=1 
mlen=1024 partitions=100 producers=4 rf=1 targetRate=200k time=30m topics=1 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version

2023-06-01 Thread via GitHub


novosibman commented on PR #13782:
URL: https://github.com/apache/kafka/pull/13782#issuecomment-1572027303

   > Are all the graphs shared for OMB and Kafka Tussle generated for Kafka 
with the fix in this PR?
   Graphs with the fix noted in first description comment  - marked with 
`kafka_2.13-3.6.0-snapshot-fix` label.
   
   Other graphs in latter comment are examples of how rolling affects results 
on different configurations and benchmarks using regular Kafka release.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version

2023-06-01 Thread via GitHub


novosibman commented on PR #13782:
URL: https://github.com/apache/kafka/pull/13782#issuecomment-1572620634

   Provided updated change:
   returned original try-with-resource on writing, added utility method for 
flushing:
   ```
   try (FileChannel fileChannel = FileChannel.open(file.toPath(), 
StandardOpenOption.CREATE, StandardOpenOption.WRITE)) {
   fileChannel.write(buffer);
   }
   if (scheduler != null) {
   scheduler.scheduleOnce("flush-producer-snapshot", () -> 
Utils.flushFileQuietly(file.toPath(), "producer-snapshot"));
   } else {
   Utils.flushFileQuietly(file.toPath(), "producer-snapshot");
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] novosibman commented on pull request #13782: Suggest for performance fix: KAFKA-9693 Kafka latency spikes caused by log segment flush on roll - trunk version

2023-06-08 Thread via GitHub


novosibman commented on PR #13782:
URL: https://github.com/apache/kafka/pull/13782#issuecomment-1583094847

   Open/close changes provided. 
   Also corrected style check issue (in task ':storage:checkstyleMain').


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org