[ 
https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729618#comment-17729618
 ] 

Luke Chen commented on KAFKA-15046:
-----------------------------------

[~ocadaruma] , thanks for the help! 

Ping me when PR is created. Thanks.

> Produce performance issue under high disk load
> ----------------------------------------------
>
>                 Key: KAFKA-15046
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15046
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 3.3.2
>            Reporter: Haruki Okada
>            Assignee: Haruki Okada
>            Priority: Major
>              Labels: performance
>         Attachments: image-2023-06-01-12-46-30-058.png, 
> image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
> image-2023-06-01-12-56-19-108.png
>
>
> * Phenomenon:
>  ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
>  ** Producer response time 99%ile got quite bad when we performed replica 
> reassignment on the cluster
>  *** RequestQueue scope was significant
>  ** Also request-time throttling happened at the incidental time. This caused 
> producers to delay sending messages in the mean time.
>  ** The disk I/O latency was higher than usual due to the high load for 
> replica reassignment.
>  *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
>  * Analysis:
>  ** The request-handler utilization was much higher than usual.
>  *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
>  ** Also, thread time utilization was much higher than usual on almost all 
> users
>  *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
>  ** From taking jstack several times, for most of them, we found that a 
> request-handler was doing fsync for flusing ProducerState and meanwhile other 
> request-handlers were waiting Log#lock for appending messages.
>  * 
>  ** 
>  *** 
> {code:java}
> "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
> cpu=51264789.27ms elapsed=599242.76s tid=0x00007efdaeba7770 nid=0x1e704 
> runnable  [0x00007ef9a12e2000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native 
> Method)
>         at 
> sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
>         at 
> sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
>         at 
> kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
>         at 
> kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
>         at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
>         - locked <0x000000060d75d820> (a java.lang.Object)
>         at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
>         - locked <0x000000060d75d820> (a java.lang.Object)
>         at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
>         - locked <0x000000060d75d820> (a java.lang.Object)
>         at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
>         at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
>         at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
>         at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
>         at 
> kafka.server.ReplicaManager$$Lambda$2379/0x0000000800b7c040.apply(Unknown 
> Source)
>         at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>         at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
>         at scala.collection.mutable.HashMap.map(HashMap.scala:35)
>         at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
>         at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
>         at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
>         at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}
>  * 
>  ** Also there were bunch of logs that writing producer snapshots took 
> hundreds of milliseconds.
>  *** 
> {code:java}
> ...
> [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
> producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
> producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
> (kafka.log.ProducerStateManager)
> [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
> producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
> (kafka.log.ProducerStateManager)
> ... {code}
>  * From the analysis, we summarized the issue as below:
>  * 
>  ** 1. Disk write latency got worse due to the replica reassignment
>  *** We already use replication quota, and lowering the quota further may not 
> be acceptable for too long assignment duration
>  ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
> latency
>  *** This is done at every log segment roll.
>  *** In our case, the broker hosts high load partitions so log roll is 
> occurring very frequently.
>  ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all 
> subsequent produce requests to the partition is blocked due to Log#lock
>  ** 4. During produce requests waiting the lock, they consume request handler 
> threads time so it's accounted as thread-time utilization and caused 
> throttling
>  * Suggestion:
>  ** We didn't see this phenomenon when we used Kafka 2.4.1.
>  *** ProducerState fsync was introduced in 2.8.0 by this: 
> https://issues.apache.org/jira/browse/KAFKA-9892
>  ** The reason why ProducerState needs to be fsync is not well described in 
> above ticket though, we think fsync is not really necessary here. Because:
>  * 
>  ** 
>  *** If ProducerState snapshot file was not written to the disk after power 
> failure, it will be just rebuilt from logs.
>  *** Also, even if ProducerState snapshot was corrupted after power failure, 
> it will be rebuilt from logs thanks to CRC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to