[jira] [Assigned] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion
[ https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-13403: Assignee: Arun Mathew (was: Haruki Okada) > KafkaServer crashes when deleting topics due to the race in log deletion > > > Key: KAFKA-13403 > URL: https://issues.apache.org/jira/browse/KAFKA-13403 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.4.1 >Reporter: Haruki Okada >Assignee: Arun Mathew >Priority: Major > > h2. Environment > * OS: CentOS Linux release 7.6 > * Kafka version: 2.4.1 > * > ** But as far as I checked the code, I think same phenomenon could happen > even on trunk > * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs > is set) > * Java version: AdoptOpenJDK 1.8.0_282 > h2. Phenomenon > When we were in the middle of deleting several topics by `kafka-topics.sh > --delete --topic blah-blah`, one broker in our cluster crashed due to > following exception: > > {code:java} > [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in > /data/kafka have failed (kafka.log.LogManager) > {code} > > > We also found NoSuchFileException was thrown right before the crash when > LogManager tried to delete logs for some partitions. > > {code:java} > [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 > in dir /data/kafka (kafka.server.LogDirFailureChannel) > java.nio.file.NoSuchFileException: > /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > at java.nio.file.Files.readAttributes(Files.java:1737) > at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) > at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) > at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372) > at java.nio.file.Files.walkFileTree(Files.java:2706) > at java.nio.file.Files.walkFileTree(Files.java:2742) > at org.apache.kafka.common.utils.Utils.delete(Utils.java:732) > at kafka.log.Log.$anonfun$delete$2(Log.scala:2036) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at kafka.log.Log.maybeHandleIOException(Log.scala:2343) > at kafka.log.Log.delete(Log.scala:2030) > at kafka.log.LogManager.deleteLogs(LogManager.scala:826) > at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840) > at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > So, the log-dir was marked as offline and ended up with KafkaServer crash > because the broker has only single log-dir. > h2. Cause > We also found below logs right before the NoSuchFileException. > > {code:java} > [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is renamed to > /data/kafka/foo-bar-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete and is > scheduled for deletion (kafka.log.LogManager) > [2021-10-21 18:18:17,900] INFO [Log partition=foo-bar-5, dir=/data/kafka] > Found deletable segments with base offsets [3877066] due to retention time > 17280ms breach (kafka.log.Log)[2021-10-21 18:18:17,901] INFO [Log > partition=foo-bar-5, dir=/data/kafka] Scheduling segments for deletion > List(LogSegment(baseOffset=3877066, size=90316366, > lastModifiedTime=1634634956000, largestTime=1634634955854)) (kafka.log.Log) > {code} > After checking through Kafka code, w
[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning
[ https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866105#comment-17866105 ] Haruki Okada commented on KAFKA-17076: -- Also, since lastOffset() always returns original last offset even after the compaction, log end offset of compacted log will not rewind https://github.com/apache/kafka/blob/3.7.1/clients/src/main/java/org/apache/kafka/common/record/RecordBatch.java#L118-L120 > logEndOffset could be lost due to log cleaning > -- > > Key: KAFKA-17076 > URL: https://issues.apache.org/jira/browse/KAFKA-17076 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Jun Rao >Priority: Major > > It's possible for the log cleaner to remove all records in the suffix of the > log. If the partition is then reassigned, the new replica won't be able to > see the true logEndOffset since there is no record batch associated with it. > If this replica becomes the leader, it will assign an already used offset to > a newly produced record, which is incorrect. > > It's relatively rare to trigger this issue since the active segment is never > cleaned and typically is not empty. However, the following is one possibility. > # records with offset 100-110 are produced and fully replicated to all ISR. > All those records are delete records for certain keys. > # record with offset 111 is produced. It forces the roll of a new segment in > broker b1 and is added to the log. The record is not committed and is later > truncated from the log, leaving an empty active segment in this log. b1 at > some point becomes the leader. > # log cleaner kicks in and removes records 100-110. > # The partition is reassigned to another broker b2. b2 replicates all > records from b1 up to offset 100 and marks its logEndOffset at 100. Since > there is no record to replicate after offset 100 in b1, b2's logEndOffset > stays at 100 and b2 can join the ISR. > # b2 becomes the leader and assign offset 100 to a new record. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-17076) logEndOffset could be lost due to log cleaning
[ https://issues.apache.org/jira/browse/KAFKA-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865823#comment-17865823 ] Haruki Okada commented on KAFKA-17076: -- [~junrao] Is that possible? At the step 2 in your scenario, I guess truncation doesn't happen unless at least one record is returned from Fetch response because of (https://github.com/apache/kafka/pull/9382), so empty active segment is not possible in my understanding. refs: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum#KIP595:ARaftProtocolfortheMetadataQuorum-Fetch > logEndOffset could be lost due to log cleaning > -- > > Key: KAFKA-17076 > URL: https://issues.apache.org/jira/browse/KAFKA-17076 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Jun Rao >Priority: Major > > It's possible for the log cleaner to remove all records in the suffix of the > log. If the partition is then reassigned, the new replica won't be able to > see the true logEndOffset since there is no record batch associated with it. > If this replica becomes the leader, it will assign an already used offset to > a newly produced record, which is incorrect. > > It's relatively rare to trigger this issue since the active segment is never > cleaned and typically is not empty. However, the following is one possibility. > # records with offset 100-110 are produced and fully replicated to all ISR. > All those records are delete records for certain keys. > # record with offset 111 is produced. It forces the roll of a new segment in > broker b1 and is added to the log. The record is not committed and is later > truncated from the log, leaving an empty active segment in this log. b1 at > some point becomes the leader. > # log cleaner kicks in and removes records 100-110. > # The partition is reassigned to another broker b2. b2 replicates all > records from b1 up to offset 100 and marks its logEndOffset at 100. Since > there is no record to replicate after offset 100 in b1, b2's logEndOffset > stays at 100 and b2 can join the ISR. > # b2 becomes the leader and assign offset 100 to a new record. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17865110#comment-17865110 ] Haruki Okada commented on KAFKA-17061: -- Did a micro benchmark to check the performance improvement of `addUpdateMetadataRequestForBrokers` by the patch. Benchmark code: [https://gist.github.com/ocadaruma/e80be044227d6235126310e9058f546d] !screenshot-flame.png|width=320! !screenshot-flame-patched.png|width=320! As we can see, isReplicaOnline is no longer a bottleneck. > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, > screenshot-flame-patched.png, screenshot-flame.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > called by `isReplicaOnline` takes significant time in > `addUpdateMetadataRequestForBrokers` invocation on broker startup. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Description: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > called by `isReplicaOnline` takes significant time in >`addUpdateMetadataRequestForBrokers` invocation on broker startup. !image-2024-07-02-17-24-11-861.png|width=541,height=303! was: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` >invocation on broker startup. !image-2024-07-02-17-24-11-861.png|width=541,height=303! > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, > screenshot-flame-patched.png, screenshot-flame.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state cha
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Attachment: screenshot-flame-patched.png > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, > screenshot-flame-patched.png, screenshot-flame.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` > invocation on broker startup. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Attachment: screenshot-flame.png > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png, > screenshot-flame.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` > invocation on broker startup. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Attachment: flame-patched.html > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` > invocation on broker startup. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Attachment: flame.html > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: flame-patched.html, flame.html, > image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` > invocation on broker startup. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Description: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time in `addUpdateMetadataRequestForBrokers` >invocation on broker startup. !image-2024-07-02-17-24-11-861.png|width=541,height=303! was: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. !image-2024-07-02-17-24-11-861.png|width=541,height=303! > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: image-2024-07-02-17-22-06-100.png, > image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/k
[jira] [Commented] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863197#comment-17863197 ] Haruki Okada commented on KAFKA-17061: -- [~showuon] Hi, I submitted a [patch|https://github.com/apache/kafka/pull/16529]. Could you take a look? > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: image-2024-07-02-17-22-06-100.png, > image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-17061: - Description: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. !image-2024-07-02-17-24-11-861.png|width=541,height=303! was: h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. !image-2024-07-02-17-24-11-861.png|width=541,height=303! Since no concurrent modification against liveBrokerEpochs is expected, we can just cache the result to improve the performance. > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: image-2024-07-02-17-22-06-100.png, > image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we
[jira] [Assigned] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
[ https://issues.apache.org/jira/browse/KAFKA-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-17061: Assignee: Haruki Okada > KafkaController takes long time to connect to newly added broker after > registration on large cluster > > > Key: KAFKA-17061 > URL: https://issues.apache.org/jira/browse/KAFKA-17061 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Attachments: image-2024-07-02-17-22-06-100.png, > image-2024-07-02-17-24-11-861.png > > > h2. Environment > * Kafka version: 3.3.2 > * Cluster: 200~ brokers > * Total num partitions: 40k > * ZK-based cluster > h2. Phenomenon > When a broker left the cluster once due to the long STW and came back after a > while, the controller took 6 seconds until connecting to the broker after > znode registration, it caused significant message delivery delay. > {code:java} > [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, > deleted brokers: , bounced brokers: , all live brokers: 1,... > (kafka.controller.KafkaController) > [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller > 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) > [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting > (kafka.controller.RequestSendThread) > [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback > for 2 (kafka.controller.KafkaController) > [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller > 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change > requests (kafka.controller.RequestSendThread) > {code} > h2. Analysis > From the flamegraph at that time, we can see that > [liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. > !image-2024-07-02-17-24-11-861.png|width=541,height=303! > Since no concurrent modification against liveBrokerEpochs is expected, we can > just cache the result to improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster
Haruki Okada created KAFKA-17061: Summary: KafkaController takes long time to connect to newly added broker after registration on large cluster Key: KAFKA-17061 URL: https://issues.apache.org/jira/browse/KAFKA-17061 Project: Kafka Issue Type: Improvement Reporter: Haruki Okada Attachments: image-2024-07-02-17-22-06-100.png, image-2024-07-02-17-24-11-861.png h2. Environment * Kafka version: 3.3.2 * Cluster: 200~ brokers * Total num partitions: 40k * ZK-based cluster h2. Phenomenon When a broker left the cluster once due to the long STW and came back after a while, the controller took 6 seconds until connecting to the broker after znode registration, it caused significant message delivery delay. {code:java} [2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, deleted brokers: , bounced brokers: , all live brokers: 1,... (kafka.controller.KafkaController) [2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 trying to connect to broker 2 (kafka.controller.ControllerChannelManager) [2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting (kafka.controller.RequestSendThread) [2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback for 2 (kafka.controller.KafkaController) [2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change requests (kafka.controller.RequestSendThread) {code} h2. Analysis >From the flamegraph at that time, we can see that >[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217] > calculation takes significant time. !image-2024-07-02-17-24-11-861.png|width=541,height=303! Since no concurrent modification against liveBrokerEpochs is expected, we can just cache the result to improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15612) Followup on whether the segment indexes need to be materialized or flushed before they are passed to RSM for writing them to tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856223#comment-17856223 ] Haruki Okada commented on KAFKA-15612: -- In my understanding, we concluded that index files are not necessary to be flushed as it's guaranteed that mmap-ed content are consistent with the file in https://issues.apache.org/jira/browse/KAFKA-15609 Do we need other follow-ups? > Followup on whether the segment indexes need to be materialized or flushed > before they are passed to RSM for writing them to tiered storage. > - > > Key: KAFKA-15612 > URL: https://issues.apache.org/jira/browse/KAFKA-15612 > Project: Kafka > Issue Type: Task >Reporter: Satish Duggana >Priority: Major > Fix For: 3.9.0 > > > Followup on the [PR > comment|https://github.com/apache/kafka/pull/14529#discussion_r1360877868] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever
[ https://issues.apache.org/jira/browse/KAFKA-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853312#comment-17853312 ] Haruki Okada commented on KAFKA-16916: -- Seems [~apoorvmittal10] already identified the root cause and submitted the PR, thanks Apoorv, Luke. > ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will > run forever > -- > > Key: KAFKA-16916 > URL: https://issues.apache.org/jira/browse/KAFKA-16916 > Project: Kafka > Issue Type: Bug >Reporter: Luke Chen >Assignee: Apoorv Mittal >Priority: Blocker > Fix For: 3.8.0 > > > ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will > run forever -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16916) ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will run forever
[ https://issues.apache.org/jira/browse/KAFKA-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-16916: Assignee: Apoorv Mittal (was: Haruki Okada) > ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will > run forever > -- > > Key: KAFKA-16916 > URL: https://issues.apache.org/jira/browse/KAFKA-16916 > Project: Kafka > Issue Type: Bug >Reporter: Luke Chen >Assignee: Apoorv Mittal >Priority: Blocker > Fix For: 3.8.0 > > > ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials will > run forever -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash
[ https://issues.apache.org/jira/browse/KAFKA-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847684#comment-17847684 ] Haruki Okada commented on KAFKA-16541: -- [~junrao] Hi, i've just submitted a patch. PTAL > Potential leader epoch checkpoint file corruption on OS crash > - > > Key: KAFKA-16541 > URL: https://issues.apache.org/jira/browse/KAFKA-16541 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.7.0 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Minor > > Pointed out by [~junrao] on > [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125] > [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid > of fsync of leader-epoch ckeckpoint file in some path for performance reason. > However, since now checkpoint file is flushed to the device asynchronously by > OS, content would corrupt if OS suddenly crashes (e.g. by power failure, > kernel panic) in the middle of flush. > Corrupted checkpoint file could prevent Kafka broker to start-up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843116#comment-17843116 ] Haruki Okada commented on KAFKA-16372: -- [~mpedersencrwd] In either case, it takes time to fix because: * revert to the documented behavior => this is a breaking change so we may not able to fix until 4.0 release * introduce an exception base class => In my understanding this needs KIP So fixing javadoc might be appropriate as the short term fix for now. Besides, I would like to clarify the use case of differentiating synchronous/asynchronous timeout. {quote}our actions might vary because of this{quote} Could you tell how the action will be different depending on broker may receive the message or not? > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash
[ https://issues.apache.org/jira/browse/KAFKA-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842816#comment-17842816 ] Haruki Okada commented on KAFKA-16541: -- [~junrao] Yes. My concern now is only changing renameDir may not be enough, so I'm trying to figure out if we can fix in another way without checking all call paths > Potential leader epoch checkpoint file corruption on OS crash > - > > Key: KAFKA-16541 > URL: https://issues.apache.org/jira/browse/KAFKA-16541 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.7.0 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Minor > > Pointed out by [~junrao] on > [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125] > [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid > of fsync of leader-epoch ckeckpoint file in some path for performance reason. > However, since now checkpoint file is flushed to the device asynchronously by > OS, content would corrupt if OS suddenly crashes (e.g. by power failure, > kernel panic) in the middle of flush. > Corrupted checkpoint file could prevent Kafka broker to start-up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16651) KafkaProducer.send does not throw TimeoutException as documented
[ https://issues.apache.org/jira/browse/KAFKA-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842582#comment-17842582 ] Haruki Okada commented on KAFKA-16651: -- Might be duplicated: https://issues.apache.org/jira/browse/KAFKA-16372 > KafkaProducer.send does not throw TimeoutException as documented > > > Key: KAFKA-16651 > URL: https://issues.apache.org/jira/browse/KAFKA-16651 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 3.6.2 >Reporter: Mike Pedersen >Priority: Major > > In the JavaDoc for {{{}KafkaProducer#send(ProducerRecord, Callback){}}}, it > claims that it will throw a {{TimeoutException}} if blocking on fetching > metadata or allocating memory and surpassing {{{}max.block.ms{}}}. > {quote}Throws: > {{TimeoutException}} - If the time taken for fetching metadata or allocating > memory for the record has surpassed max.block.ms. > {quote} > ([link|https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#send(org.apache.kafka.clients.producer.ProducerRecord,org.apache.kafka.clients.producer.Callback)]) > But this is not the case. As {{TimeoutException}} is an {{ApiException}} it > will hit [this > catch|https://github.com/a0x8o/kafka/blob/54eff6af115ee647f60129f2ce6a044cb17215d0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1073-L1084] > which will result in a failed future being returned instead of the exception > being thrown. > The "allocating memory" part likely changed as part of > [KAFKA-3720|https://github.com/apache/kafka/pull/8399/files#diff-43491ffa1e0f8d28db071d8c23f1a76b54f1f20ea98cf6921bfd1c77a90446abR29] > which changed the base exception for buffer exhaustion exceptions to > {{{}TimeoutException{}}}. Timing out waiting on metadata suffers the same > issue, but it is not clear whether this has always been the case. > This is basically a discrepancy between documentation and behavior, so it's a > question of which one should be adjusted. > And on that, being able to differentiate between synchronous timeouts (as > caused by waiting on metadata or allocating memory) and asynchronous timeouts > (eg. timing out waiting for acks) is useful. In the former case we _know_ > that the broker has not received the event but in the latter it _may_ be that > the broker has received it but the ack could not be delivered, and our > actions might vary because of this. The current behavior makes this hard to > differentiate since both result in a {{TimeoutException}} being delivered via > the callback. Currently, I am relying on the exception message string to > differentiate these two, but this is basically just relying on implementation > detail that may change at any time. Therefore I would suggest to either: > * Revert to the documented behavior of throwing in case of synchronous > timeouts > * Correct the javadoc and introduce an exception base class/interface for > synchronous timeouts -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash
Haruki Okada created KAFKA-16541: Summary: Potential leader epoch checkpoint file corruption on OS crash Key: KAFKA-16541 URL: https://issues.apache.org/jira/browse/KAFKA-16541 Project: Kafka Issue Type: Bug Components: core Reporter: Haruki Okada Assignee: Haruki Okada Pointed out by [~junrao] on [GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125] [A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid of fsync of leader-epoch ckeckpoint file in some path for performance reason. However, since now checkpoint file is flushed to the device asynchronously by OS, content would corrupt if OS suddenly crashes (e.g. by power failure, kernel panic) in the middle of flush. Corrupted checkpoint file could prevent Kafka broker to start-up -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly
[ https://issues.apache.org/jira/browse/KAFKA-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-16393: Assignee: Haruki Okada > SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly > --- > > Key: KAFKA-16393 > URL: https://issues.apache.org/jira/browse/KAFKA-16393 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is > implemented like below: > {code:java} > public long write(ByteBuffer[] srcs, int offset, int length) throws > IOException { > ... > int i = offset; > while (i < length) { > if (srcs[i].hasRemaining() || hasPendingWrites()) { > > {code} > The loop index starts at `offset` and ends with `length`. > However this isn't correct because end-index should be `offset + length`. > Let's say we have the array of ByteBuffer with length = 5 and try calling > this method with offset = 3, length = 1. > In current code, `write(srcs, 3, 1)` doesn't attempt any write because the > loop condition is immediately false. > For now, seems this method is only called with args offset = 0, length = > srcs.length in Kafka code base so not causing any problem though, we should > fix this because this could introduce subtle bug if use this method with > different args in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly
Haruki Okada created KAFKA-16393: Summary: SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly Key: KAFKA-16393 URL: https://issues.apache.org/jira/browse/KAFKA-16393 Project: Kafka Issue Type: Improvement Reporter: Haruki Okada As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is implemented like below: {code:java} public long write(ByteBuffer[] srcs, int offset, int length) throws IOException { ... int i = offset; while (i < length) { if (srcs[i].hasRemaining() || hasPendingWrites()) { {code} The loop index starts at `offset` and ends with `length`. However this isn't correct because end-index should be `offset + length`. Let's say we have the array of ByteBuffer with length = 5 and try calling this method with offset = 3, length = 1. In current code, `write(srcs, 3, 1)` doesn't attempt any write because the loop condition is immediately false. For now, seems this method is only called with args offset = 0, length = srcs.length in Kafka code base so not causing any problem though, we should fix this because this could introduce subtle bug if use this method with different args in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960 ] Haruki Okada edited comment on KAFKA-16372 at 3/20/24 2:16 PM: --- [~showuon] Agreed. One concern is, IMO many developers expect this "exception thrown on buffer full after max.block.ms"-behavior (because it's stated in Javadoc while we rarely hit buffer-full situation so no one realized this discrepancy). Even some famous open-sources have exception-handling code which doesn't work actually due to this. (e.g. [logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29]) I wonder if just fixing Javadoc and Kafka documentation is fine, or we should make a heads up about this somewhere (e.g. at Kafka user mailing list). I would like to hear committer's opinion. Anyways, meanwhile let me start fixing the docs. was (Author: ocadaruma): [~showuon] Agreed. One concern is, IMO many developers expect this "exception thrown on buffer full after max.block.ms"-behavior (because it's stated in Javadoc while we rarely hit buffer-full situation so no one realized this discrepancy). Even some famous open-sources have exception-handling code which doesn't work actually due to this. (e.g. [logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29]) I wonder if just fixing Javadoc and Kafka documentation is fine, or we should include a heads up about this somewhere (e.g. at Kafka user mailing list). I would like to hear committer's opinion. Anyways, meanwhile let me start fixing the docs. > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-16372: Assignee: Haruki Okada > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960 ] Haruki Okada edited comment on KAFKA-16372 at 3/20/24 2:15 PM: --- [~showuon] Agreed. One concern is, IMO many developers expect this "exception thrown on buffer full after max.block.ms"-behavior (because it's stated in Javadoc while we rarely hit buffer-full situation so no one realized this discrepancy). Even some famous open-sources have exception-handling code which doesn't work actually due to this. (e.g. [logback-kafka-appender|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29]) I wonder if just fixing Javadoc and Kafka documentation is fine, or we should include a heads up about this somewhere (e.g. at Kafka user mailing list). I would like to hear committer's opinion. Anyways, meanwhile let me start fixing the docs. was (Author: ocadaruma): [~showuon] Agreed. One concern is, IMO many developers expect this "exception thrown on buffer full after max.block.ms"-behavior (because it's stated in Javadoc while we rarely hit buffer-full situation so no one realized this discrepancy). Even some famous open-sources have exception-handling code which doesn't work actually due to this. (e.g. [logback-kafka-append|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29]) I wonder if just fixing Javadoc and Kafka documentation is fine, or we should include a heads up about this somewhere (e.g. at Kafka user mailing list). I would like to hear committer's opinion. Anyways, meanwhile let me start fixing the docs. > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828960#comment-17828960 ] Haruki Okada commented on KAFKA-16372: -- [~showuon] Agreed. One concern is, IMO many developers expect this "exception thrown on buffer full after max.block.ms"-behavior (because it's stated in Javadoc while we rarely hit buffer-full situation so no one realized this discrepancy). Even some famous open-sources have exception-handling code which doesn't work actually due to this. (e.g. [logback-kafka-append|https://github.com/danielwegener/logback-kafka-appender/blob/master/src/main/java/com/github/danielwegener/logback/kafka/delivery/AsynchronousDeliveryStrategy.java#L29]) I wonder if just fixing Javadoc and Kafka documentation is fine, or we should include a heads up about this somewhere (e.g. at Kafka user mailing list). I would like to hear committer's opinion. Anyways, meanwhile let me start fixing the docs. > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-16372: - Description: As of Kafka 3.7.0, the javadoc of [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] states that it throws TimeoutException when max.block.ms is exceeded on buffer allocation or initial metadata fetch. Also it's stated in [buffer.memory config description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. However, I found that this is not true because TimeoutException extends ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] instead of throwing it. I wonder if this is a bug or the documentation error. Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. was: As of Kafka 3.7.0, the javadoc of [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] states that it throws TimeoutException when max.block.ms is exceeded on buffer allocation or initial metadata fetch. Also it's stated in [max.block.ms config description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. However, I found that this is not true because TimeoutException extends ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] instead of throwing it. I wonder if this is a bug or the documentation error. Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [buffer.memory config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
[ https://issues.apache.org/jira/browse/KAFKA-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-16372: - Component/s: producer (was: clients) Priority: Minor (was: Major) > max.block.ms behavior inconsistency with javadoc and the config description > --- > > Key: KAFKA-16372 > URL: https://issues.apache.org/jira/browse/KAFKA-16372 > Project: Kafka > Issue Type: Bug > Components: producer >Reporter: Haruki Okada >Priority: Minor > > As of Kafka 3.7.0, the javadoc of > [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] > states that it throws TimeoutException when max.block.ms is exceeded on > buffer allocation or initial metadata fetch. > Also it's stated in [max.block.ms config > description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. > However, I found that this is not true because TimeoutException extends > ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as > FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] > instead of throwing it. > I wonder if this is a bug or the documentation error. > Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description
Haruki Okada created KAFKA-16372: Summary: max.block.ms behavior inconsistency with javadoc and the config description Key: KAFKA-16372 URL: https://issues.apache.org/jira/browse/KAFKA-16372 Project: Kafka Issue Type: Bug Components: clients Reporter: Haruki Okada As of Kafka 3.7.0, the javadoc of [KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956] states that it throws TimeoutException when max.block.ms is exceeded on buffer allocation or initial metadata fetch. Also it's stated in [max.block.ms config description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory]. However, I found that this is not true because TimeoutException extends ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086] instead of throwing it. I wonder if this is a bug or the documentation error. Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll
[ https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791354#comment-17791354 ] Haruki Okada commented on KAFKA-9693: - [~paolomoriello] [~novosibman] Hi, I believe the latency spike due to flushing on log.roll is now resolved by https://issues.apache.org/jira/browse/KAFKA-15046 > Kafka latency spikes caused by log segment flush on roll > > > Key: KAFKA-9693 > URL: https://issues.apache.org/jira/browse/KAFKA-9693 > Project: Kafka > Issue Type: Improvement > Components: core > Environment: OS: Amazon Linux 2 > Kafka version: 2.2.1 >Reporter: Paolo Moriello >Assignee: Paolo Moriello >Priority: Major > Labels: Performance, latency, performance > Fix For: 3.7.0 > > Attachments: image-2020-03-10-13-17-34-618.png, > image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, > image-2020-03-10-15-00-54-204.png, image-2020-06-23-12-24-46-548.png, > image-2020-06-23-12-24-58-788.png, image-2020-06-26-13-43-21-723.png, > image-2020-06-26-13-46-52-861.png, image-2020-06-26-14-06-01-505.png, > latency_plot2.png > > > h1. Summary > When a log segment fills up, Kafka rolls over onto a new active segment and > force the flush of the old segment to disk. When this happens, log segment > _append_ duration increase causing important latency spikes on producer(s) > and replica(s). This ticket aims to highlight the problem and propose a > simple mitigation: add a new configuration to enable/disable rolled segment > flush. > h1. 1. Phenomenon > Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to > ~50x-200x more than usual. For instance, normally 99th %ile is lower than > 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th > %iles even jump to 500-700ms. > Latency spikes happen at constant frequency (depending on the input > throughput), for small amounts of time. All the producers experience a > latency increase at the same time. > h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314! > {{Example of response time plot observed during on a single producer.}} > URPs rarely appear in correspondence of the latency spikes too. This is > harder to reproduce, but from time to time it is possible to see a few > partitions going out of sync in correspondence of a spike. > h1. 2. Experiment > h2. 2.1 Setup > Kafka cluster hosted on AWS EC2 instances. > h4. Cluster > * 15 Kafka brokers: (EC2 m5.4xlarge) > ** Disk: 1100Gb EBS volumes (4750Mbps) > ** Network: 10 Gbps > ** CPU: 16 Intel Xeon Platinum 8000 > ** Memory: 64Gb > * 3 Zookeeper nodes: m5.large > * 6 producers on 6 EC2 instances in the same region > * 1 topic, 90 partitions - replication factor=3 > h4. Broker config > Relevant configurations: > {quote}num.io.threads=8 > num.replica.fetchers=2 > offsets.topic.replication.factor=3 > num.network.threads=5 > num.recovery.threads.per.data.dir=2 > min.insync.replicas=2 > num.partitions=1 > {quote} > h4. Perf Test > * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per > broker) > * record size = 2 > * Acks = 1, linger.ms = 1, compression.type = none > * Test duration: ~20/30min > h2. 2.2 Analysis > Our analysis showed an high +correlation between log segment flush count/rate > and the latency spikes+. This indicates that the spikes in max latency are > related to Kafka behavior on rolling over new segments. > The other metrics did not show any relevant impact on any hardware component > of the cluster, eg. cpu, memory, network traffic, disk throughput... > > !latency_plot2.png|width=924,height=308! > {{Correlation between latency spikes and log segment flush count. p50, p95, > p99, p999 and p latencies (left axis, ns) and the flush #count (right > axis, stepping blue line in plot).}} > Kafka schedules logs flushing (this includes flushing the file record > containing log entries, the offset index, the timestamp index and the > transaction index) during _roll_ operations. A log is rolled over onto a new > empty log when: > * the log segment is full > * the maxtime has elapsed since the timestamp of first message in the > segment (or, in absence of it, since the create time) > * the index is full > In this case, the increase in latency happens on _append_ of a new message > set to the active segment of the log. This is a synchronous operation which > therefore blocks producers requests, causing the latency increase. > To confirm this, I instrumented Kafka to measure the duration of > FileRecords.append(MemoryRecords) method, which is responsible of writing > memory records to file. As a result, I observed the same spiky pattern as in > the producer latency, with a
[jira] [Created] (KAFKA-15924) Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive
Haruki Okada created KAFKA-15924: Summary: Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive Key: KAFKA-15924 URL: https://issues.apache.org/jira/browse/KAFKA-15924 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/15/tests] {code:java} Error org.opentest4j.AssertionFailedError: expected: but was: Stacktrace org.opentest4j.AssertionFailedError: expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141) at app//org.apache.kafka.controller.QuorumControllerTest.testFatalMetadataReplayErrorOnActive(QuorumControllerTest.java:1132) at java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) at app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at app//org.junit.platform.engine.support.hie
[jira] [Updated] (KAFKA-15924) Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive
[ https://issues.apache.org/jira/browse/KAFKA-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15924: - Attachment: stdout.log > Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive > -- > > Key: KAFKA-15924 > URL: https://issues.apache.org/jira/browse/KAFKA-15924 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > Attachments: stdout.log > > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/15/tests] > > {code:java} > Error > org.opentest4j.AssertionFailedError: expected: > but was: > Stacktrace > org.opentest4j.AssertionFailedError: expected: > but was: > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at > app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) > at > app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141) > at > app//org.apache.kafka.controller.QuorumControllerTest.testFatalMetadataReplayErrorOnActive(QuorumControllerTest.java:1132) > at > java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566) > at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) > at > app//org.juni
[jira] [Updated] (KAFKA-15920) Flaky test - PlaintextConsumerTest.testCoordinatorFailover
[ https://issues.apache.org/jira/browse/KAFKA-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15920: - Attachment: stdout.log > Flaky test - PlaintextConsumerTest.testCoordinatorFailover > -- > > Key: KAFKA-15920 > URL: https://issues.apache.org/jira/browse/KAFKA-15920 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > Attachments: stdout.log > > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > {code:java} > Error > org.opentest4j.AssertionFailedError: expected: <0> but was: <1> > Stacktrace > org.opentest4j.AssertionFailedError: expected: <0> but was: <1> > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at > app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) > at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527) > at > app//kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:326) > at > app//kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:109) > at > java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566) > at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) > at > app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) > at > app//
[jira] [Updated] (KAFKA-15921) Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications
[ https://issues.apache.org/jira/browse/KAFKA-15921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15921: - Attachment: stdout.log > Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications > -- > > Key: KAFKA-15921 > URL: https://issues.apache.org/jira/browse/KAFKA-15921 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > Attachments: stdout.log > > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > {code:java} > Error > org.opentest4j.AssertionFailedError: expected: <0> but was: <1> > Stacktrace > org.opentest4j.AssertionFailedError: expected: <0> but was: <1> > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at > app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) > at > app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) > at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628) > at > app//kafka.api.SaslScramSslEndToEndAuthorizationTest.testAuthentications(SaslScramSslEndToEndAuthorizationTest.scala:92) > at > java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568) > at > app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) > at > app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) > at > app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) > at > app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) > at > app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) > at > app//org.junit.platf
[jira] [Updated] (KAFKA-15919) Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs
[ https://issues.apache.org/jira/browse/KAFKA-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15919: - Description: [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error org.opentest4j.AssertionFailedError: expected: but was: Stacktrace org.opentest4j.AssertionFailedError: expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141) at app//kafka.server.BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs(BrokerLifecycleManagerTest.scala:236) at java.base@21.0.1/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base@21.0.1/java.lang.reflect.Method.invoke(Method.java:580) at app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) at app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) at java.base@21.0.1/java.util.ArrayList.forEach(ArrayList.java:1596) at app//org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$exec
[jira] [Updated] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15918: - Attachment: stdout.log > Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets > > > Key: KAFKA-15918 > URL: https://issues.apache.org/jira/browse/KAFKA-15918 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > Attachments: stdout.log > > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > > {code:java} > Error > org.opentest4j.AssertionFailedError: Condition not met within timeout 3. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: but was: > Stacktrace > org.opentest4j.AssertionFailedError: Condition not met within timeout 3. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: but was: > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) > at > org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.refle
[jira] [Updated] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks
[ https://issues.apache.org/jira/browse/KAFKA-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15917: - Attachment: stdout.log > Flaky test - > OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks > --- > > Key: KAFKA-15917 > URL: https://issues.apache.org/jira/browse/KAFKA-15917 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > Attachments: stdout.log > > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > > > {code:java} > Error > java.lang.AssertionError: > Expected: a string containing "zombie sink task" > but: was "Could not alter connector offsets. Error response: > {"error_code":500,"message":"Failed to alter consumer group offsets for > connector test-connector"}" > Stacktrace > java.lang.AssertionError: > Expected: a string containing "zombie sink task" > but: was "Could not alter connector offsets. Error response: > {"error_code":500,"message":"Failed to alter consumer group offsets for > connector test-connector"}" > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.jav
[jira] [Created] (KAFKA-15921) Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications
Haruki Okada created KAFKA-15921: Summary: Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications Key: KAFKA-15921 URL: https://issues.apache.org/jira/browse/KAFKA-15921 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error org.opentest4j.AssertionFailedError: expected: <0> but was: <1> Stacktrace org.opentest4j.AssertionFailedError: expected: <0> but was: <1> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628) at app//kafka.api.SaslScramSslEndToEndAuthorizationTest.testAuthentications(SaslScramSslEndToEndAuthorizationTest.scala:92) at java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568) at app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) at app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at app//org.junit.platform.engine.support.hi
[jira] [Created] (KAFKA-15920) Flaky test - PlaintextConsumerTest.testCoordinatorFailover
Haruki Okada created KAFKA-15920: Summary: Flaky test - PlaintextConsumerTest.testCoordinatorFailover Key: KAFKA-15920 URL: https://issues.apache.org/jira/browse/KAFKA-15920 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error org.opentest4j.AssertionFailedError: expected: <0> but was: <1> Stacktrace org.opentest4j.AssertionFailedError: expected: <0> but was: <1> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527) at app//kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:326) at app//kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:109) at java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566) at app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) at app
[jira] [Updated] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks
[ https://issues.apache.org/jira/browse/KAFKA-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15917: - Summary: Flaky test - OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks (was: Flaky test - OffsetsApiIntegrationTest. testAlterSinkConnectorOffsetsZombieSinkTasks) > Flaky test - > OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks > --- > > Key: KAFKA-15917 > URL: https://issues.apache.org/jira/browse/KAFKA-15917 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > > > {code:java} > Error > java.lang.AssertionError: > Expected: a string containing "zombie sink task" > but: was "Could not alter connector offsets. Error response: > {"error_code":500,"message":"Failed to alter consumer group offsets for > connector test-connector"}" > Stacktrace > java.lang.AssertionError: > Expected: a string containing "zombie sink task" > but: was "Could not alter connector offsets. Error response: > {"error_code":500,"message":"Failed to alter consumer group offsets for > connector test-connector"}" > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) > at com.sun.proxy
[jira] [Created] (KAFKA-15919) Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs
Haruki Okada created KAFKA-15919: Summary: Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs Key: KAFKA-15919 URL: https://issues.apache.org/jira/browse/KAFKA-15919 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error org.opentest4j.AssertionFailedError: expected: but was: Stacktrace org.opentest4j.AssertionFailedError: expected: but was: at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141) at app//kafka.server.BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs(BrokerLifecycleManagerTest.scala:236) at java.base@21.0.1/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base@21.0.1/java.lang.reflect.Method.invoke(Method.java:580) at app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45) at app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) at app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139) at app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141) at app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139) at app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138) at app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95) at java.base@21.0.1/java.util.ArrayList.forEach(ArrayList.java:1596) at app//org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecuto
[jira] [Updated] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets
[ https://issues.apache.org/jira/browse/KAFKA-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15918: - Summary: Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets (was: Flaky test - OffsetsApiIntegrationTest. testResetSinkConnectorOffsets) > Flaky test - OffsetsApiIntegrationTest.testResetSinkConnectorOffsets > > > Key: KAFKA-15918 > URL: https://issues.apache.org/jira/browse/KAFKA-15918 > Project: Kafka > Issue Type: Bug >Reporter: Haruki Okada >Priority: Major > Labels: flaky-test > > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] > > {code:java} > Error > org.opentest4j.AssertionFailedError: Condition not met within timeout 3. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: but was: > Stacktrace > org.opentest4j.AssertionFailedError: Condition not met within timeout 3. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: but was: > at > org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) > at > org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725) > at > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.
[jira] [Created] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest. testResetSinkConnectorOffsets
Haruki Okada created KAFKA-15918: Summary: Flaky test - OffsetsApiIntegrationTest. testResetSinkConnectorOffsets Key: KAFKA-15918 URL: https://issues.apache.org/jira/browse/KAFKA-15918 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error org.opentest4j.AssertionFailedError: Condition not met within timeout 3. Sink connector consumer group offsets should catch up to the topic end offsets ==> expected: but was: Stacktrace org.opentest4j.AssertionFailedError: Condition not met within timeout 3. Sink connector consumer group offsets should catch up to the topic end offsets ==> expected: but was: at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) at org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDi
[jira] [Created] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest. testAlterSinkConnectorOffsetsZombieSinkTasks
Haruki Okada created KAFKA-15917: Summary: Flaky test - OffsetsApiIntegrationTest. testAlterSinkConnectorOffsetsZombieSinkTasks Key: KAFKA-15917 URL: https://issues.apache.org/jira/browse/KAFKA-15917 Project: Kafka Issue Type: Bug Reporter: Haruki Okada [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/] {code:java} Error java.lang.AssertionError: Expected: a string containing "zombie sink task" but: was "Could not alter connector offsets. Error response: {"error_code":500,"message":"Failed to alter consumer group offsets for connector test-connector"}" Stacktrace java.lang.AssertionError: Expected: a string containing "zombie sink task" but: was "Could not alter connector offsets. Error response: {"error_code":500,"message":"Failed to alter consumer group offsets for connector test-connector"}" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176) at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60) at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56) at org.gradle.process.internal.worker.child.SystemApplicationCla
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1778#comment-1778 ] Haruki Okada commented on KAFKA-15609: -- I added [MmapTest3.java|https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d] and confirmed the reads by read() and writes via mmap are consistent. > Would you happen to have some reference that I can read about this? I failed to find good web reference but the book "The Linux Programming Interface 2nd edition" mentions about this. excerpt: {quote}Like many other modern UNIX implementations, Linux provides a so-called unified virtual memory system. This means that, where possible, memory mappings and blocks of the buffer cache share the same pages of physical memory. Thus, the views of a file obtained via a mapping and via I/O system calls (read(), write(), and so on) are always consistent, and the only use of msync() is to force the contents of a mapped region to be flushed to disk.{quote} > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782194#comment-17782194 ] Haruki Okada edited comment on KAFKA-15609 at 11/2/23 3:36 PM: --- [~divijvaidya] Right, when we call two mmaps, they will be mapped to different virtual address. However, as long as we use FileChannel.map, the write to one mmap is guaranteed to be visible to another mmap because they will be mapped with MAP_SHARED flag (except MapMode.PRIVATE). [https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88] was (Author: ocadaruma): [~divijvaidya] Right, when we call two mmaps, they will be mapped to different virtual address. However, as long as we use FileChannel.map, the write to one mmap is guaranteed to be visible to another mmap because they will be mapped with MAP_SHARED flag. https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88 > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782202#comment-17782202 ] Haruki Okada edited comment on KAFKA-15609 at 11/2/23 4:12 PM: --- I validated the MappedByteBuffer behavior with this Java code: [https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d] When we create two mmaps from the same file, writes to 1st one are always visible to 2nd one unless we specify MapMode.PRIVATE. Also, in my understanding, page cache is directly mapped to the mmap area so even when we try to read the file with ordinary read() call which is written by mmap, the content should be consistent. at least in Linux was (Author: ocadaruma): I validated the MappedByteBuffer behavior with this Java code: [https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d] When we create two mmaps from the same file, writes to 1st one are always visible to 2nd one unless we specify MapMode.PRIVATE. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782194#comment-17782194 ] Haruki Okada commented on KAFKA-15609: -- [~divijvaidya] Right, when we call two mmaps, they will be mapped to different virtual address. However, as long as we use FileChannel.map, the write to one mmap is guaranteed to be visible to another mmap because they will be mapped with MAP_SHARED flag. https://github.com/adoptium/jdk11u/blob/jdk-11.0.21%2B7/src/java.base/unix/native/libnio/ch/FileChannelImpl.c#L88 > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782202#comment-17782202 ] Haruki Okada commented on KAFKA-15609: -- I validated the MappedByteBuffer behavior with this Java code: [https://gist.github.com/ocadaruma/fc26fc122829c63cb61e14d7fc96896d] When we create two mmaps from the same file, writes to 1st one are always visible to 2nd one unless we specify MapMode.PRIVATE. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15688) Partition leader election not running when disk IO hangs
[ https://issues.apache.org/jira/browse/KAFKA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779956#comment-17779956 ] Haruki Okada commented on KAFKA-15688: -- > Is it possible to add such a feature to Kafka so that it shuts down in this > case as well? That could be tricky to implement at Kafka level since to make disk IO timeout in case of device hung, a kind of timer has to be set on another thread for every IO. (Because the thread executing I/O can do nothing for timeout) I guess there are several options to address the issue: 1) Set I/O timeout at OS/device level to cause IOException (which causes Kafka to stop) at Kafka level on disk hung 2) Deploy another process to watch disk health and let it kill Kafka on disk hung For either solutions, a concern is, when a broker is unable to process requests due to disk hung (without leadership change), the broker may kick out other followers from ISR set unexpectedly (since it can't handle Fetch requests so can't increment HW) before it got killed. In this case, the broker could be the last ISR so stopping it may cause the partition to be offline, which needs unclean leader election. [KIP-966|https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas] could be the solution for this problem though. Apart from above, [https://github.com/apache/kafka/pull/14242] could mitigate your issue I guess. The thing is, even when disk got hung, produce shouldn't be disrupted because Kafka doesn't wait IOs for log-append to be synched to the device. (unless too many dirty pages accumulate) However, as of Kafka 3.3.2, there are several paths which calls fsync on log-roll with holding UnifiedLog#lock. Due to this, if disk hungs during doing fsync, UnifiedLog#lock will be held for long time and all subsequent requests against same parittion may be blocked in the meantime. Actually, we encountered similar issue on our on-prem Kafka which consists of a lot of HDDs that some HDD got broken on a daily basis. The frequency of the issue is mitigated by the above patch indeed. > Partition leader election not running when disk IO hangs > > > Key: KAFKA-15688 > URL: https://issues.apache.org/jira/browse/KAFKA-15688 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 3.3.2 >Reporter: Peter Sinoros-Szabo >Priority: Major > > We run our Kafka brokers on AWS EC2 nodes using AWS EBS as disk to store the > messages. > Recently we had an issue when the EBS disk IO just stalled so Kafka was not > able to write or read anything from the disk, well except the data that was > still in page cache or that still fitted into the page cache before it is > synced to EBS. > We experienced this issue in a few cases: sometimes partition leaders were > moved away to other brokers automatically, in other cases that didn't happen > and caused the Producers to fail producing messages to that broker. > My expectation from Kafka in such a case would be that it notices it and > moves the leaders to other brokers where the partition has in sync replicas, > but as I mentioned this didn't happen always. > I know Kafka will shut itself down in case it can't write to its disk, that > might be a good solution in this case as well as it would trigger the leader > election automatically. > Is it possible to add such a feature to Kafka so that it shuts down in this > case as well? > I guess similar issue might happen with other disk subsystems too or even > with a broken and slow disk. > This scenario can be easily reproduced using AWS FIS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier
[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775965#comment-17775965 ] Haruki Okada commented on KAFKA-15609: -- > is the OS intelligent enough to understand that it should provide a "dirty" / > non-flushed view of the file to the second thread as well? As Ismael pointed out, all file operations go through page cache (DirectIO is the exception but it isn't a case in Kafka) so uploading unflushed index to the remote storage shouldn't be an issue. > Corrupted index uploaded to remote tier > --- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage >Affects Versions: 3.6.0 >Reporter: Divij Vaidya >Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15567) ReplicaFetcherThreadBenchmark is not working
Haruki Okada created KAFKA-15567: Summary: ReplicaFetcherThreadBenchmark is not working Key: KAFKA-15567 URL: https://issues.apache.org/jira/browse/KAFKA-15567 Project: Kafka Issue Type: Improvement Reporter: Haruki Okada Assignee: Haruki Okada * ReplicaFetcherThreadBenchmark is not working as of current trunk (https://github.com/apache/kafka/tree/c223a9c3761f796468ccfdae9e177e764ab6a965) {code:java} % jmh-benchmarks/jmh.sh ReplicaFetcherThreadBenchmark (snip) java.lang.NullPointerException at kafka.server.metadata.ZkMetadataCache.(ZkMetadataCache.scala:89) at kafka.server.MetadataCache.zkMetadataCache(MetadataCache.scala:120) at org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.setup(ReplicaFetcherThreadBenchmark.java:220) at org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest._jmh_tryInit_f_replicafetcherthreadbenchmark0_G(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:448) at org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.testFetcher_AverageTime(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:164) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:527) at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:504) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread
[ https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758588#comment-17758588 ] Haruki Okada commented on KAFKA-7504: - I would like to bump this issue up again since this issue still exists even in current Kafka. > Are you currently running this patch in production? > do you plan on contributing it to the project? [~enether] I'm a colleague of [~kawamuray] and the patch is running on our production clusters for years. This patch is crucial to keep the performance stable when catch-up reads happens. We now have a plan to contribute it to the upstream. I'll ping you when once I submit a patch. > Broker performance degradation caused by call of sendfile reading disk in > network thread > > > Key: KAFKA-7504 > URL: https://issues.apache.org/jira/browse/KAFKA-7504 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.2.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura >Priority: Major > Labels: latency, performance > Attachments: Network_Request_Idle_After_Patch.png, > Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, > Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, > image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, > image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, > image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, > image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, > image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png > > > h2. Environment > OS: CentOS6 > Kernel version: 2.6.32-XX > Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from > trunk (2.2.0-SNAPSHOT) > h2. Phenomenon > Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x > more than usual. > Normally 99th %ile is lower than 20ms, but when this issue occurs it marks > 50ms to 200ms. > At the same time we could see two more things in metrics: > 1. Disk read coincidence from the volume assigned to log.dirs. > 2. Raise in network threads utilization (by > `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`) > As we didn't see increase of requests in metrics, we suspected blocking in > event loop ran by network thread as the cause of raising network thread > utilization. > Reading through Kafka broker source code, we understand that the only disk > IO performed in network thread is reading log data through calling > sendfile(2) (via FileChannel#transferTo). > To probe that the calls of sendfile(2) are blocking network thread for some > moments, I ran following SystemTap script to inspect duration of sendfile > syscalls. > {code:java} > # Systemtap script to measure syscall duration > global s > global records > probe syscall.$1 { > s[tid()] = gettimeofday_us() > } > probe syscall.$1.return { > elapsed = gettimeofday_us() - s[tid()] > delete s[tid()] > records <<< elapsed > } > probe end { > print(@hist_log(records)) > }{code} > {code:java} > $ stap -v syscall-duration.stp sendfile > # value (us) > value | count > 0 | 0 > 1 |71 > 2 |@@@ 6171 >16 |@@@ 29472 >32 |@@@ 3418 > 2048 | 0 > ... > 8192 | 3{code} > As you can see there were some cases taking more than few milliseconds, > implies that it blocks network thread for that long and applying the same > latency for all other request/response processing. > h2. Hypothesis > Gathering the above observations, I made the following hypothesis. > Let's say network-thread-1 multiplexing 3 connections. > - producer-A > - follower-B (broker replica fetch) > - consumer-C > Broker receives requests from each of those clients, [Produce, FetchFollower, > FetchConsumer]. > They are processed well by request handler threads, and now the response > queue of the network-thread contains 3 responses in following order: > [FetchConsumer, Produce, FetchFollower]. > network-thread-1 takes 3 responses and processes them sequentially > ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]). > Ideally processing of these 3 responses completes in microseconds as in it > just copies ready responses into client socket's buffer with non-blocking > manner. > However, Kafka uses sendfile(2) for t
[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758026#comment-17758026 ] Haruki Okada commented on KAFKA-15391: -- For this issue, we may need to swallow NoSuchFileException in Utils.flushDir but yeah I'll check the usage and add another method instead of changing existing one if necessary > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Assignee: Haruki Okada >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-15391: Assignee: Haruki Okada > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Assignee: Haruki Okada >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757792#comment-17757792 ] Haruki Okada commented on KAFKA-15391: -- May I take this ticket? I'm interested since this issue may also happen on our cluster (3.3.2) so I'm happy to solve that. I can submit a patch today > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757442#comment-17757442 ] Haruki Okada commented on KAFKA-15391: -- I see, understood. Thanks > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757388#comment-17757388 ] Haruki Okada edited comment on KAFKA-15391 at 8/22/23 11:57 AM: -Related? https://issues.apache.org/jira/browse/KAFKA-13403- Hmm, similar but seems different was (Author: ocadaruma): Related? https://issues.apache.org/jira/browse/KAFKA-13403 > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15391) Delete topic may lead to directory offline
[ https://issues.apache.org/jira/browse/KAFKA-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757388#comment-17757388 ] Haruki Okada commented on KAFKA-15391: -- Related? https://issues.apache.org/jira/browse/KAFKA-13403 > Delete topic may lead to directory offline > -- > > Key: KAFKA-15391 > URL: https://issues.apache.org/jira/browse/KAFKA-15391 > Project: Kafka > Issue Type: Bug > Components: core >Reporter: Divij Vaidya >Priority: Major > Fix For: 3.6.0 > > > This is an edge case where the entire log directory is marked offline when we > delete a topic. This symptoms of this scenario is characterised by the > following logs: > {noformat} > [2023-08-14 09:22:12,600] ERROR Uncaught exception in scheduled task > 'flush-log' (org.apache.kafka.server.util.KafkaScheduler:152) > org.apache.kafka.common.errors.KafkaStorageException: Error while flushing > log for test-0 in dir /tmp/kafka-15093588566723278510 with offset 221 > (exclusive) and recovery point 221 Caused by: > java.nio.file.NoSuchFileException: > /tmp/kafka-15093588566723278510/test-0{noformat} > The above log is followed by logs such as: > {noformat} > [2023-08-14 09:22:12,601] ERROR Uncaught exception in scheduled task > 'flush-log' > (org.apache.kafka.server.util.KafkaScheduler:152)org.apache.kafka.common.errors.KafkaStorageException: > The log dir /tmp/kafka-15093588566723278510 is already offline due to a > previous IO exception.{noformat} > The below sequence of events demonstrate the scenario where this bug manifests > 1. On the broker, partition lock is acquired and UnifiedLog.roll() is called > which schedules an async call for > flushUptoOffsetExclusive(). The roll may be called due to segment rotation > time or size. > 2. Admin client calls deleteTopic > 3. On the broker, LogManager.asyncDelete() is called which will call > UnifiedLog.renameDir() > 4. The directory for the partition is successfully renamed with a "delete" > suffix. > 5. The async task scheduled in step 1 (flushUptoOffsetExclusive) starts > executing. It tries to call localLog.flush() without acquiring a partition > lock. > 6. LocalLog calls Utils.flushDir() which fails with an IOException. > 7. On IOException, log directory is added to logDirFailureChannel > 8. Any new interaction with this logDir fails and a log line is printed such > as > "The log dir $logDir is already offline due to a previous IO exception" > > This is the reason DeleteTopicTest is flaky as well - > https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=trunk&search.timeZoneId=Europe/Berlin&tests.container=kafka.admin.DeleteTopicTest&tests.test=testDeleteTopicWithCleaner() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755904#comment-17755904 ] Haruki Okada edited comment on KAFKA-15046 at 8/18/23 12:05 PM: I submitted a patch [https://github.com/apache/kafka/pull/14242] . In the meantime, I tested above patch (with porting it to 3.3.2, which is the version we use) in our experimental environment: * Setting: ** num.io.threads = 48 ** incoming byte-rate: 18MB/sec ** Adding 300ms artificial write-delay into the device using [device-mapper|https://github.com/kawamuray/ddi] * Without patch: ** !image-2023-08-18-19-23-36-597.png|width=292,height=164! ** request-handler idle ratio is below 40% ** produce-response time 99.9%ile is over 1 sec ** We see producer-state snapshotting takes hundreds of millisecs *** {code:java} (snip) [2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 3030259 with 0 producer ids in 777 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote producer snapshot at offset 2991767 with 0 producer ids in 678 ms. (kafka.log.ProducerStateManager){code} * With patch: ** !image-2023-08-18-19-29-56-377.png|width=297,height=169! ** request-handler idle ratio is kept 75% ** produce-response time 99.9%ile is around 100ms ** producer-state snapshotting done in millisecs in most cases *** {code:java} (snip) [2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 6219284 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote producer snapshot at offset 6208459 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager){code} was (Author: ocadaruma): I submitted a patch [https://github.com/apache/kafka/pull/14242] . In the meantime, I tested above patch (with porting it to 3.3.2, which is the version we use) in our experimental environment: * Setting: ** num.io.threads = 48 ** incoming byte-rate: 18MB/sec ** Adding 300ms artificial write-delay into the device using [device-mapper|https://github.com/kawamuray/ddi] * Without patch: ** !image-2023-08-18-19-23-36-597.png|width=292,height=164! ** request-handler idle ratio is below 40% ** produce-response time 99.9%ile is over 1 sec ** We see producer-state snapshotting takes hundreds of millisecs *** {code:java} (snip) [2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 3030259 with 0 producer ids in 777 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote producer snapshot at offset 2991767 with 0 producer ids in 678 ms. (kafka.log.ProducerStateManager){code} * With patch: ** !image-2023-08-18-19-29-56-377.png|width=297,height=169! ** request-handler idle ratio is kept 75% ** produce-response time 99.9%ile is around 100ms ** producer-state snapshotting takes few millisecs in most cases *** {code:java} (snip) [2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 6219284 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote producer snapshot at offset 6208459 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager){code} > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, > image-2023-08-18-19-29-56-377.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utiliza
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755904#comment-17755904 ] Haruki Okada commented on KAFKA-15046: -- I submitted a patch [https://github.com/apache/kafka/pull/14242] . In the meantime, I tested above patch (with porting it to 3.3.2, which is the version we use) in our experimental environment: * Setting: ** num.io.threads = 48 ** incoming byte-rate: 18MB/sec ** Adding 300ms artificial write-delay into the device using [device-mapper|https://github.com/kawamuray/ddi] * Without patch: ** !image-2023-08-18-19-23-36-597.png|width=292,height=164! ** request-handler idle ratio is below 40% ** produce-response time 99.9%ile is over 1 sec ** We see producer-state snapshotting takes hundreds of millisecs *** {code:java} (snip) [2023-08-18 13:23:02,552] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 3030259 with 0 producer ids in 777 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:23:02,852] INFO [ProducerStateManager partition=xxx-10] Wrote producer snapshot at offset 2991767 with 0 producer ids in 678 ms. (kafka.log.ProducerStateManager){code} * With patch: ** !image-2023-08-18-19-29-56-377.png|width=297,height=169! ** request-handler idle ratio is kept 75% ** produce-response time 99.9%ile is around 100ms ** producer-state snapshotting takes few millisecs in most cases *** {code:java} (snip) [2023-08-18 13:40:09,383] INFO [ProducerStateManager partition=xxx-3] Wrote producer snapshot at offset 6219284 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager) [2023-08-18 13:40:09,818] INFO [ProducerStateManager partition=icbm-2] Wrote producer snapshot at offset 6208459 with 0 producer ids in 0 ms. (kafka.log.ProducerStateManager){code} > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, > image-2023-08-18-19-29-56-377.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Attachment: image-2023-08-18-19-29-56-377.png > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png, > image-2023-08-18-19-29-56-377.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote > producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote > producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. > (kafka.log.ProducerStateManager) > ... {cod
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Attachment: image-2023-08-18-19-23-36-597.png > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png, image-2023-08-18-19-23-36-597.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote > producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote > producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. > (kafka.log.ProducerStateManager) > ... {code} > * From the analysis, we summariz
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755898#comment-17755898 ] Haruki Okada commented on KAFKA-15046: -- After dug into the fsync call paths in detail, I summarized the problem and the solutions like below: h2. Problem * While any blocking operation under holding the UnifiedLog.lock could lead to serious performance (even availability) issues, currently there are several paths that calls fsync(2) inside the lock ** In the meantime the lock is held, all subsequent produces against the partition may block ** This easily causes all request-handlers to be busy on bad disk performance ** Even worse, when a disk experiences tens of seconds of glitch (it's not rare in spinning drives), it makes the broker to unable to process any requests with unfenced from the cluster (i.e. "zombie" like status) h2. Analysis of fsync(2) inside UnifidedLog.lock First, fsyncs on start-up/shutdown timing isn't a problem since the broker isn't processing requests. Given that, there are essentially 4 problematic call paths listed below: h3. 1. [ProducerStateManager.takeSnapshot at UnifiedLog.roll|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L2133] * Here, solutions is just moving fsync(2) call to the scheduler thread as part of existing "flush-log" job (before incrementing recovery point) h3. 2. [ProducerStateManager.removeAndMarkSnapshotForDeletion as part of log segment deletion|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L2133] * removeAndMarkSnapshotForDeletion calls Utils.atomicMoveWithFallback with parent-dir flushing when renaming to add .deleted suffix * Here, we don't need to flush parent-dir I suppose. * Worst case scenario, few producer snapshots which should've been deleted are remained with lucking .deleted after unclean shutdown ** In this case, these files will be eventually deleted so shouldn't be a big problem. h3. 3. [LeaderEpochFileCache.truncateFromStart when incrementing log-start-offset|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L986] * This path is called from deleteRecords on request-handler threads. * Here, we don't need fsync(2) either actually. * Upon unclean shutdown, few leader epochs might be remained in the file but it will be [handled by LogLoader|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/LogLoader.scala#L185] on start-up so not a problem h3. 4. [LeaderEpochFileCache.truncateFromEnd as part of log truncation|https://github.com/apache/kafka/blob/3f4816dd3eafaf1a0636d3ee689069f897c99e28/core/src/main/scala/kafka/log/UnifiedLog.scala#L1663] * Though this path is called mainly on replica fetcher threads, blocking replica fetchers isn't ideal either, since it could cause remote-scope produce performance degradation on leader side * Likewise, we don't need fsync(2) here since any epochs which untruncated will be handled on log loading procedure > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > ***
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351 ] Haruki Okada edited comment on KAFKA-15046 at 8/17/23 4:09 AM: --- [~junrao] Hi, sorry for the late response. Thanks for your suggestion. > Another way to improve this is to move the LeaderEpochFile flushing logic to > be part of the flushing of rolled segments Yeah, that sounds make sense. I think ProducerState snapshot also should be unified to existing flushing logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. current Kafka behavior), nor submitting to scheduler separately (i.e. like ongoing patch([https://github.com/apache/kafka/pull/13782]) does) was (Author: ocadaruma): [~junrao] Hi, sorry for the late response. Thanks for your suggestion. > Another way to improve this is to move the LeaderEpochFile flushing logic to > be part of the flushing of rolled segments Yeah, that sounds make sense. I think ProducerState snapshot also should be the unified to existing flushing logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. current Kafka behavior), nor submitting to scheduler separately (i.e. like ongoing patch([https://github.com/apache/kafka/pull/13782]) does) > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) >
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351 ] Haruki Okada edited comment on KAFKA-15046 at 8/17/23 4:08 AM: --- [~junrao] Hi, sorry for the late response. Thanks for your suggestion. > Another way to improve this is to move the LeaderEpochFile flushing logic to > be part of the flushing of rolled segments Yeah, that sounds make sense. I think ProducerState snapshot also should be the unified to existing flushing logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. current Kafka behavior), nor submitting to scheduler separately (i.e. like ongoing patch([https://github.com/apache/kafka/pull/13782]) does) was (Author: ocadaruma): [~junrao] Hi, sorry for the late response. Thanks for your suggestion. > Another way to improve this is to move the LeaderEpochFile flushing logic to > be part of the flushing of rolled segments Yeah, that sounds make sense. I think ProducerState snapshot also should be the unified to existing flushing logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. current Kafka behavior), nor submitting to scheduler separately (i.e. like [ongoing patch|[https://github.com/apache/kafka/pull/13782]] ongoing patch does) > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755351#comment-17755351 ] Haruki Okada commented on KAFKA-15046: -- [~junrao] Hi, sorry for the late response. Thanks for your suggestion. > Another way to improve this is to move the LeaderEpochFile flushing logic to > be part of the flushing of rolled segments Yeah, that sounds make sense. I think ProducerState snapshot also should be the unified to existing flushing logic then, instead of fsync-ing ProducerState separately in log.roll (i.e. current Kafka behavior), nor submitting to scheduler separately (i.e. like [ongoing patch|[https://github.com/apache/kafka/pull/13782]] ongoing patch does) > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > produce
[jira] [Commented] (KAFKA-15185) Consumers using the latest strategy may lose data after the topic adds partitions
[ https://issues.apache.org/jira/browse/KAFKA-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742830#comment-17742830 ] Haruki Okada commented on KAFKA-15185: -- FYI: maybe duplicated with https://issues.apache.org/jira/browse/KAFKA-12478, https://issues.apache.org/jira/browse/KAFKA-12261 > Consumers using the latest strategy may lose data after the topic adds > partitions > - > > Key: KAFKA-15185 > URL: https://issues.apache.org/jira/browse/KAFKA-15185 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 3.4.1 >Reporter: RivenSun >Assignee: Luke Chen >Priority: Major > > h2. condition: > 1. Business topic adds partition > 2. The configuration metadata.max.age.ms of producers and consumers is set to > five minutes. > But the producer discovered the new partition before the consumer, and > generated 100 messages to the new partition. > 3. The consumer parameter auto.offset.reset is set to *latest* > h2. result: > Consumers will lose these 100 messages > First of all, we cannot directly set auto.offset.reset to {*}earliest{*}. > Because the user's demand is that a newly subscribed group can discard all > old messages of the topic. > However, after the group is subscribed, the message generated by the expanded > partition {*}must be guaranteed not to be lost{*}, similar to starting > consumption from the earliest. > h2. > h2. suggestion: > We have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the > producer's metadata.max.age.ms configuration. > But this still can't solve the problem, because in many cases, the producer > may force refresh the metadata. > Secondly, a smaller metadata.max.age.ms value will bring more metadata > refresh requests, which will increase the burden on the broker. > So can we add a parameter to control how the consumer determines whether to > start consumption from the earliest or latest for the newly added partition. > Perhaps during the rebalance process, the leaderConsumer needs to mark which > partitions are newly added when calculating the assignment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada reassigned KAFKA-14445: Assignee: Haruki Okada > Producer doesn't request metadata update on REQUEST_TIMED_OUT > - > > Key: KAFKA-14445 > URL: https://issues.apache.org/jira/browse/KAFKA-14445 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > > Produce requests may fail with timeout by `request.timeout.ms` in below two > cases: > * Didn't receive produce response within `request.timeout.ms` > * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the > broker > Former case usually happens when a broker-machine got failed or there's > network glitch etc. > In this case, the connection will be disconnected and metadata-update will be > requested to discover new leader: > [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] > > The problem is in latter case (REQUEST_TIMED_OUT on the broker). > In this case, the produce request will be ended up with TimeoutException, > which doesn't inherit InvalidMetadataException so it doesn't trigger metadata > update. > > Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side > problem, that metadata-update doesn't make much sense indeed. > > However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT > could cause produce requests to retry unnecessarily , which may end up with > batch expiration due to delivery timeout. > Below is the scenario we experienced: > * Environment: > ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 > ** min.insync.replicas=2 > ** acks=all > * Scenario: > ** broker 1 "partially" failed > *** It lost ZooKeeper connection and kicked out from the cluster > There was controller log like: > * > {code:java} > [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , > deleted brokers: 1, bounced brokers: {code} > * > ** > *** However, somehow the broker was able continued to receive produce > requests > We're still working on investigating how this is possible though. > Indeed, broker 1 was somewhat "alive" and keeps working according to > server.log > *** In other words, broker 1 became "zombie" > ** broker 2 was elected as new leader > *** broker 3 became follower of broker 2 > *** However, since broker 1 was still out of cluster, it didn't receive > LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 > ** Meanwhile, producer keeps sending produce requests to broker 1 and > requests were failed due to REQUEST_TIMED_OUT because no brokers replicates > from broker 1. > *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't > have a change to update its stale metadata > > So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, > to address the case that the old leader became "zombie" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729695#comment-17729695 ] Haruki Okada commented on KAFKA-14445: -- [~kirktrue] Thanks for your patch about https://issues.apache.org/jira/browse/KAFKA-14317 . However if I read the patch correctly, I guess our original issue should be addressed separately. Our issue was the case where the producer receives REQUEST_TIMED_OUT response (i.e. request timed out inside the purgatory while waiting replication), rather than NetworkClient-level timeout. So I think the || clause here ([https://github.com/apache/kafka/pull/12813#discussion_r1048223644]) was necessary against the discussion. Though this is kind of extreme edge case, I would like to solve anyways as it caused a batch expiration on our producer. I'll submit a follow-up patch. > Producer doesn't request metadata update on REQUEST_TIMED_OUT > - > > Key: KAFKA-14445 > URL: https://issues.apache.org/jira/browse/KAFKA-14445 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Priority: Major > > Produce requests may fail with timeout by `request.timeout.ms` in below two > cases: > * Didn't receive produce response within `request.timeout.ms` > * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the > broker > Former case usually happens when a broker-machine got failed or there's > network glitch etc. > In this case, the connection will be disconnected and metadata-update will be > requested to discover new leader: > [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] > > The problem is in latter case (REQUEST_TIMED_OUT on the broker). > In this case, the produce request will be ended up with TimeoutException, > which doesn't inherit InvalidMetadataException so it doesn't trigger metadata > update. > > Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side > problem, that metadata-update doesn't make much sense indeed. > > However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT > could cause produce requests to retry unnecessarily , which may end up with > batch expiration due to delivery timeout. > Below is the scenario we experienced: > * Environment: > ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 > ** min.insync.replicas=2 > ** acks=all > * Scenario: > ** broker 1 "partially" failed > *** It lost ZooKeeper connection and kicked out from the cluster > There was controller log like: > * > {code:java} > [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , > deleted brokers: 1, bounced brokers: {code} > * > ** > *** However, somehow the broker was able continued to receive produce > requests > We're still working on investigating how this is possible though. > Indeed, broker 1 was somewhat "alive" and keeps working according to > server.log > *** In other words, broker 1 became "zombie" > ** broker 2 was elected as new leader > *** broker 3 became follower of broker 2 > *** However, since broker 1 was still out of cluster, it didn't receive > LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 > ** Meanwhile, producer keeps sending produce requests to broker 1 and > requests were failed due to REQUEST_TIMED_OUT because no brokers replicates > from broker 1. > *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't > have a change to update its stale metadata > > So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, > to address the case that the old leader became "zombie" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729561#comment-17729561 ] Haruki Okada commented on KAFKA-15046: -- I see, thank you for pointing out. Hmm, now I agree with just making fileDescriptor fsync call asynchronous should be fine. (I'm still doubting if we can't move LeaderEpochFileCache's method call outside of Log.lock because underlying CheckpointFile is doing exclusive control by itself ([https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L56]) though, carefully checking all paths which calling fsync to move it outside of Log.lock is too error prone and maybe hard to maintain) May I assign this issue to me? I would like to submit a patch to make LeaderEpochFile's fsync to be async. (for ProducerState snapshot, [https://github.com/apache/kafka/pull/13782] should cover already) > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs t
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728551#comment-17728551 ] Haruki Okada edited comment on KAFKA-15046 at 6/2/23 7:19 AM: -- [~showuon] Maybe I linked wrong file. What I thought is to make any LeaderEpochFileCache methods which needs flush() to be called outside of Log's global lock. LeaderEpochFileCache already does exclusive control by its RW lock so I think we don't need to call it inside the Log's global lock. [https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44] was (Author: ocadaruma): [~showuon] Maybe I linked wrong file. What I thought is to make any LeaderEpochFileCache methods (which needs flush()) to be called outside of Log's global lock. LeaderEpochFileCache already does exclusive control by its RW lock so I think we don't need to call it inside the Log's global lock. [https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44] > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHan
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728551#comment-17728551 ] Haruki Okada commented on KAFKA-15046: -- [~showuon] Maybe I linked wrong file. What I thought is to make any LeaderEpochFileCache methods (which needs flush()) to be called outside of Log's global lock. LeaderEpochFileCache already does exclusive control by its RW lock so I think we don't need to call it inside the Log's global lock. [https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L44] > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] W
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515 ] Haruki Okada edited comment on KAFKA-15046 at 6/2/23 4:01 AM: -- Yeah, io_uring is promising. However it only works with newer kernel (which some on-premises Kafka users may not be easy to update) and require rewriting a lot of parts of the code base. -For leader-epoch cache, the checkpointing is already done in scheduler thread so we should adopt solution2 I think- For leader epoch cache, some paths already doing checkpointing asynchronously (e.g. UnifiedLog.deleteOldSegments => UnifiedLog.maybeIncrementLogStartOffset => LeaderEpochFileCache.truncateFromStart on kafka scheduler), so we have to make fsync called outside of the lock (i.e. solution-2) anyways I think. Writing to CheckpointFile is already synchronized, so can't we just move checkpointing to outside of the lock? [https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76] was (Author: ocadaruma): Yeah, io_uring is promising. However it only works with newer kernel (which some on-premises Kafka users may not be easy to update) and require rewriting a lot of parts of the code base. For leader-epoch cache, the checkpointing is already done in scheduler thread so we should adopt solution2 I think. Writing to CheckpointFile is already synchronized, so can't we just move checkpointing to outside of the lock? [https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76] > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptim
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515 ] Haruki Okada commented on KAFKA-15046: -- Yeah, io_uring is promising. However it only works with newer kernel (which some on-premises Kafka users may not be easy to update) and require a lot of parts of the code base. For leader-epoch cache, the checkpointing is already done in scheduler thread so we should adopt solution2 I think. Writing to CheckpointFile is already synchronized, so can't we just move checkpointing to outside of the lock? https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76 > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728515#comment-17728515 ] Haruki Okada edited comment on KAFKA-15046 at 6/2/23 12:01 AM: --- Yeah, io_uring is promising. However it only works with newer kernel (which some on-premises Kafka users may not be easy to update) and require rewriting a lot of parts of the code base. For leader-epoch cache, the checkpointing is already done in scheduler thread so we should adopt solution2 I think. Writing to CheckpointFile is already synchronized, so can't we just move checkpointing to outside of the lock? [https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76] was (Author: ocadaruma): Yeah, io_uring is promising. However it only works with newer kernel (which some on-premises Kafka users may not be easy to update) and require a lot of parts of the code base. For leader-epoch cache, the checkpointing is already done in scheduler thread so we should adopt solution2 I think. Writing to CheckpointFile is already synchronized, so can't we just move checkpointing to outside of the lock? https://github.com/apache/kafka/blob/3.3.2/server-common/src/main/java/org/apache/kafka/server/common/CheckpointFile.java#L76 > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.Repl
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Description: * Phenomenon: ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! ** Producer response time 99%ile got quite bad when we performed replica reassignment on the cluster *** RequestQueue scope was significant ** Also request-time throttling happened at the incidental time. This caused producers to delay sending messages in the mean time. ** The disk I/O latency was higher than usual due to the high load for replica reassignment. *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! * Analysis: ** The request-handler utilization was much higher than usual. *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! ** Also, thread time utilization was much higher than usual on almost all users *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! ** From taking jstack several times, for most of them, we found that a request-handler was doing fsync for flusing ProducerState and meanwhile other request-handlers were waiting Log#lock for appending messages. * ** *** {code:java} "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 runnable [0x7ef9a12e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) at kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) at kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) at kafka.server.KafkaApis.handle(KafkaApis.scala:175) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} * ** Also there were bunch of logs that writing producer snapshots took hundreds of milliseconds. *** {code:java} ... [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. (kafka.log.ProducerStateManager) ... {code} * From the analysis, we summarized the issue as below: * ** 1. Disk write latency got worse due to the replica reassignment *** We already use replication quota, and lowering the quota further may not be acceptable for too long assignment duration ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync latency *** This is done at every log segment roll. *** In our case, the broker hosts high load partitions so log roll is occurring very frequently. ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent produce requests to the partition is blocked due to Log#lock ** 4. During produce requests waiting the lock, they consume request handler threads time so it's accounted as thread-time utilization and caused throttling * Suggestion: ** We didn't see this phenomenon when we used Kafka 2.4.1. *** ProducerState fsync was introduced i
[jira] [Comment Edited] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728266#comment-17728266 ] Haruki Okada edited comment on KAFKA-15046 at 6/1/23 8:39 AM: -- Hm, when I dug into further this, I noticed there's another path that causes essentially same phenomenon. {code:java} "data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 runnable [0x7ef9a0fdf000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925) at org.apache.kafka.server.common.CheckpointFile.write(CheckpointFile.java:98) - locked <0x000680fc4930> (a java.lang.Object) at kafka.server.checkpoints.CheckpointFileWithFailureHandler.write(CheckpointFileWithFailureHandler.scala:37) at kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:71) at kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:291) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3(LeaderEpochFileCache.scala:263) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3$adapted(LeaderEpochFileCache.scala:259) at kafka.server.epoch.LeaderEpochFileCache$$Lambda$571/0x00080045f040.apply(Unknown Source) at scala.Option.foreach(Option.scala:437) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$1(LeaderEpochFileCache.scala:259) at kafka.server.epoch.LeaderEpochFileCache.truncateFromStart(LeaderEpochFileCache.scala:254) at kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4(UnifiedLog.scala:1043) at kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4$adapted(UnifiedLog.scala:1043) at kafka.log.UnifiedLog$$Lambda$2324/0x000800b59040.apply(Unknown Source) at scala.Option.foreach(Option.scala:437) at kafka.log.UnifiedLog.maybeIncrementLogStartOffset(UnifiedLog.scala:1043) - locked <0x000680fc5080> (a java.lang.Object) at kafka.cluster.Partition.$anonfun$deleteRecordsOnLeader$1(Partition.scala:1476) at kafka.cluster.Partition.deleteRecordsOnLeader(Partition.scala:1463) at kafka.server.ReplicaManager.$anonfun$deleteRecordsOnLocalLog$2(ReplicaManager.scala:687) at kafka.server.ReplicaManager$$Lambda$3156/0x000800d7c840.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.deleteRecordsOnLocalLog(ReplicaManager.scala:680) at kafka.server.ReplicaManager.deleteRecords(ReplicaManager.scala:875) at kafka.server.KafkaApis.handleDeleteRecordsRequest(KafkaApis.scala:2216) at kafka.server.KafkaApis.handle(KafkaApis.scala:196) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} LeaderEpoch checkpointing also calls fsync with holding Log#lock and blocking request-handler threads to append in the meantime. This is called by scheduler thread on log-segment-breaching so might be less frequent than log roll though. Does it make sense to also making LeaderEpochCheckpointFile-flush to be outside of the lock? was (Author: ocadaruma): Hm, when I dug into further this, I noticed there's another path that causes essentially same phenomenon. {code:java} "data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 runnable [0x7ef9a0fdf000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925) at org.apache.kafk
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728266#comment-17728266 ] Haruki Okada commented on KAFKA-15046: -- Hm, when I dug into further this, I noticed there's another path that causes essentially same phenomenon. {code:java} "data-plane-kafka-request-handler-17" #169 daemon prio=5 os_prio=0 cpu=50994542.49ms elapsed=595635.65s tid=0x7efdaebabe30 nid=0x1e707 runnable [0x7ef9a0fdf000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at org.apache.kafka.common.utils.Utils.flushDir(Utils.java:966) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:925) at org.apache.kafka.server.common.CheckpointFile.write(CheckpointFile.java:98) - locked <0x000680fc4930> (a java.lang.Object) at kafka.server.checkpoints.CheckpointFileWithFailureHandler.write(CheckpointFileWithFailureHandler.scala:37) at kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:71) at kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:291) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3(LeaderEpochFileCache.scala:263) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$3$adapted(LeaderEpochFileCache.scala:259) at kafka.server.epoch.LeaderEpochFileCache$$Lambda$571/0x00080045f040.apply(Unknown Source) at scala.Option.foreach(Option.scala:437) at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromStart$1(LeaderEpochFileCache.scala:259) at kafka.server.epoch.LeaderEpochFileCache.truncateFromStart(LeaderEpochFileCache.scala:254) at kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4(UnifiedLog.scala:1043) at kafka.log.UnifiedLog.$anonfun$maybeIncrementLogStartOffset$4$adapted(UnifiedLog.scala:1043) at kafka.log.UnifiedLog$$Lambda$2324/0x000800b59040.apply(Unknown Source) at scala.Option.foreach(Option.scala:437) at kafka.log.UnifiedLog.maybeIncrementLogStartOffset(UnifiedLog.scala:1043) - locked <0x000680fc5080> (a java.lang.Object) at kafka.cluster.Partition.$anonfun$deleteRecordsOnLeader$1(Partition.scala:1476) at kafka.cluster.Partition.deleteRecordsOnLeader(Partition.scala:1463) at kafka.server.ReplicaManager.$anonfun$deleteRecordsOnLocalLog$2(ReplicaManager.scala:687) at kafka.server.ReplicaManager$$Lambda$3156/0x000800d7c840.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.deleteRecordsOnLocalLog(ReplicaManager.scala:680) at kafka.server.ReplicaManager.deleteRecords(ReplicaManager.scala:875) at kafka.server.KafkaApis.handleDeleteRecordsRequest(KafkaApis.scala:2216) at kafka.server.KafkaApis.handle(KafkaApis.scala:196) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} LeaderEpoch checkpointing also calls fsync with holding Log#lock and blocking request-handler threads to append in the meantime. This is called by scheduler thread on log-segment-breaching so might be less frequent than log roll though. Does it make sense to also making LeaderEpochCheckpointFile-flush to be asynchronous? > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Labels: performance > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728262#comment-17728262 ] Haruki Okada commented on KAFKA-15046: -- Oh I haven't noticed there's another ticket and already the fix is available. Thank you, I will take a look! > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happened at the incidental time. This caused > producers to delay sending messages in the mean time. > ** The disk I/O latency was higher than usual due to the high load for > replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote > producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote > producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. > (kafka.log.ProducerStateManager) > ... {code} > * From the analysis, we summarized the issue as
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Description: * Phenomenon: ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! ** Producer response time 99%ile got quite bad when we performed replica reassignment on the cluster *** RequestQueue scope was significant ** Also request-time throttling happened at the incidental time. This caused producers to delay sending messages in the mean time. ** The disk I/O latency was higher than usual due to the high load for replica reassignment. *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! * Analysis: ** The request-handler utilization was much higher than usual. *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! ** Also, thread time utilization was much higher than usual on almost all users *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! ** From taking jstack several times, for most of them, we found that a request-handler was doing fsync for flusing ProducerState and meanwhile other request-handlers were waiting Log#lock for appending messages. * ** *** {code:java} "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 runnable [0x7ef9a12e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) at kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) at kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) at kafka.server.KafkaApis.handle(KafkaApis.scala:175) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} * ** Also there were bunch of logs that writing producer snapshots took hundreds of milliseconds. *** {code:java} ... [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. (kafka.log.ProducerStateManager) ... {code} * From the analysis, we summarized the issue as below: * ** 1. Disk write latency got worse due to the replica reassignment *** We already use replication quota, and lowering the quota further may not be acceptable for too long assignment duration ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync latency *** This is done at every log segment roll. *** In our case, the broker hosts high load partitions so log roll is occurring very frequently. ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent produce requests to the partition is blocked due to Log#lock ** 4. During produce requests waiting the lock, they consume request handler threads time so it's accounted as thread and caused throttling * Suggestion: ** We didn't see this phenomenon when we used Kafka 2.4.1. *** ProducerState fsync was introduced in 2.8.0 by this:
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Description: * Phenomenon: ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! ** Producer response time 99%ile got quite bad when we performed replica reassignment on the cluster *** RequestQueue scope was significant ** Also request-time throttling happened at the incidental time. This caused producers to delay sending messages at the incidental time. ** At the incidental time, the disk I/O latency was higher than usual due to the high load for replica reassignment. *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! * Analysis: ** The request-handler utilization was much higher than usual. *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! ** Also, thread time utilization was much higher than usual on almost all users *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! ** From taking jstack several times, for most of them, we found that a request-handler was doing fsync for flusing ProducerState and meanwhile other request-handlers were waiting Log#lock for appending messages. * ** *** {code:java} "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 runnable [0x7ef9a12e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) at kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) at kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) at kafka.server.KafkaApis.handle(KafkaApis.scala:175) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} * ** Also there were bunch of logs that writing producer snapshots took hundreds of milliseconds. *** {code:java} ... [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. (kafka.log.ProducerStateManager) ... {code} * From the analysis, we summarized the issue as below: * ** 1. Disk write latency got worse due to the replica reassignment *** We already use replication quota, and lowering the quota further may not be acceptable for too long assignment duration ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync latency *** This is done at every log segment roll. *** In our case, the broker hosts high load partitions so log roll is occurring very frequently. ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent produce requests to the partition is blocked due to Log#lock ** 4. During produce requests waiting the lock, they consume request handler threads time so it's accounted as thread and caused throttling * Suggestion: ** We didn't see this phenomenon when we used Kafka 2.4.1. *** ProducerState fsync was
[jira] [Commented] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728214#comment-17728214 ] Haruki Okada commented on KAFKA-15046: -- If the suggestion (stop fsync-ing) makes sense, I'm happy to submit a patch. > Produce performance issue under high disk load > -- > > Key: KAFKA-15046 > URL: https://issues.apache.org/jira/browse/KAFKA-15046 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.3.2 >Reporter: Haruki Okada >Priority: Major > Attachments: image-2023-06-01-12-46-30-058.png, > image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, > image-2023-06-01-12-56-19-108.png > > > * Phenomenon: > ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! > ** Producer response time 99%ile got quite bad when we performed replica > reassignment on the cluster > *** RequestQueue scope was significant > ** Also request-time throttling happens almost all the time. This caused > producers to delay sending messages at the incidental time. > ** At the incidental time, the disk I/O latency was higher than usual due to > the high load for replica reassignment. > *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! > * Analysis: > ** The request-handler utilization was much higher than usual. > *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! > ** Also, thread time utilization was much higher than usual on almost all > users > *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! > ** From taking jstack several times, for most of them, we found that a > request-handler was doing fsync for flusing ProducerState and meanwhile other > request-handlers were waiting Log#lock for appending messages. > * > ** > *** > {code:java} > "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 > cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 > runnable [0x7ef9a12e2000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native > Method) > at > sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) > at > sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) > at > kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) > at > kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) > at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) > - locked <0x00060d75d820> (a java.lang.Object) > at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) > at > kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown > Source) > at > scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) > at > scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) > at scala.collection.mutable.HashMap.map(HashMap.scala:35) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) > at kafka.server.KafkaApis.handle(KafkaApis.scala:175) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) > at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} > * > ** Also there were bunch of logs that writing producer snapshots took > hundreds of milliseconds. > *** > {code:java} > ... > [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote > producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote > producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. > (kafka.log.ProducerStateManager) > [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote > producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. > (kafka.log.ProducerStateManager) > ... {code} > * From the analysis, we summarized the issue as below:
[jira] [Updated] (KAFKA-15046) Produce performance issue under high disk load
[ https://issues.apache.org/jira/browse/KAFKA-15046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-15046: - Description: * Phenomenon: ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! ** Producer response time 99%ile got quite bad when we performed replica reassignment on the cluster *** RequestQueue scope was significant ** Also request-time throttling happens almost all the time. This caused producers to delay sending messages at the incidental time. ** At the incidental time, the disk I/O latency was higher than usual due to the high load for replica reassignment. *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! * Analysis: ** The request-handler utilization was much higher than usual. *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! ** Also, thread time utilization was much higher than usual on almost all users *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! ** From taking jstack several times, for most of them, we found that a request-handler was doing fsync for flusing ProducerState and meanwhile other request-handlers were waiting Log#lock for appending messages. * ** *** {code:java} "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 runnable [0x7ef9a12e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) at kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) at kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) at kafka.server.KafkaApis.handle(KafkaApis.scala:175) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} * ** Also there were bunch of logs that writing producer snapshots took hundreds of milliseconds. *** {code:java} ... [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. (kafka.log.ProducerStateManager) ... {code} * From the analysis, we summarized the issue as below: * ** 1. Disk write latency got worse due to the replica reassignment *** We already use replication quota, and lowering the quota further may not be acceptable for too long assignment duration ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync latency *** This is done at every log segment roll. *** In our case, the broker hosts high load partitions so log roll is occurring very frequently. ** 3. During ProducerStateManager#takeSnapshot is doing fsync, all subsequent produce requests to the partition is blocked due to Log#lock ** 4. During produce requests waiting the lock, they consume request handler threads time so it's accounted as thread and caused throttling * Suggestion: ** We didn't see this phenomenon when we used Kafka 2.4.1. *** ProducerState fsync was int
[jira] [Created] (KAFKA-15046) Produce performance issue under high disk load
Haruki Okada created KAFKA-15046: Summary: Produce performance issue under high disk load Key: KAFKA-15046 URL: https://issues.apache.org/jira/browse/KAFKA-15046 Project: Kafka Issue Type: Improvement Affects Versions: 3.3.2 Reporter: Haruki Okada Attachments: image-2023-06-01-12-46-30-058.png, image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, image-2023-06-01-12-56-19-108.png * Phenomenon: ** !image-2023-06-01-12-46-30-058.png|width=259,height=236! ** Producer response time 99%ile got quite bad when we performed replica reassignment on the cluster *** RequestQueue scope was significant ** Also request-time throttling happens almost all the time. This caused producers to delay sending messages at the incidental time. ** At the incidental time, the disk I/O latency was higher than usual due to the high load for replica reassignment. *** !image-2023-06-01-12-56-19-108.png|width=255,height=128! * Analysis: ** The request-handler utilization was much higher than usual. *** !image-2023-06-01-12-52-40-959.png|width=278,height=113! ** Also, thread time utilization was much higher than usual on almost all users *** !image-2023-06-01-12-54-04-211.png|width=276,height=110! ** From taking jstack several times, for most of them, we found that a request-handler was doing fsync for flusing ProducerState and meanwhile other request-handlers were waiting Log#lock for appending messages. *** {code:java} "data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 runnable [0x7ef9a12e2000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method) at sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82) at sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461) at kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451) at kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754) at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.append(UnifiedLog.scala:919) - locked <0x00060d75d820> (a java.lang.Object) at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760) at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170) at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158) at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956) at kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown Source) at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28) at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27) at scala.collection.mutable.HashMap.map(HashMap.scala:35) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944) at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602) at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666) at kafka.server.KafkaApis.handle(KafkaApis.scala:175) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75) at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code} ** Also there were bunch of logs that writing producer snapshots took hundreds of milliseconds. *** {code:java} ... [2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. (kafka.log.ProducerStateManager) [2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. (kafka.log.ProducerStateManager) ... {code} * From the analysis, we summarized the issue as below: ** 1. Disk write latency got worse due to the replica reassignment *** We already use replication quota, and lowering the quota further may not be acceptable for too long assignment duration ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync latency *** This is done at every log segment roll. *** In our case, the broker hosts hundreds of partition leaders with high load, so log roll is occurring very frequently. ** 3. During ProducerStateManager#takeSnapshot is doing fsync
[jira] [Commented] (KAFKA-14757) Kafka Cooperative Sticky Assignor results in significant duplicate consumption
[ https://issues.apache.org/jira/browse/KAFKA-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693013#comment-17693013 ] Haruki Okada commented on KAFKA-14757: -- Possibly related: https://issues.apache.org/jira/browse/KAFKA-9382 In cooperative rebalancing, consumers can process new messages even during rebalancing but commits are rejected. > Kafka Cooperative Sticky Assignor results in significant duplicate consumption > -- > > Key: KAFKA-14757 > URL: https://issues.apache.org/jira/browse/KAFKA-14757 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 3.1.1 > Environment: AWS MSK (broker) and Spring Kafka (2.8.7) for use in > Spring Boot consumers. >Reporter: Siddharth Anand >Priority: Critical > > Details may be found within the linked document: > [Kafka Cooperative Sticky Assignor Issue : Duplicate Consumption | > [https://docs.google.com/document/d/1E7qAwGOpF8jo_YhF4NwUx9CXxUGJmT8OhHEqIg7-GfI/edit?usp=sharing]] > In a nutshell, we noticed that the Cooperative Sticky Assignor resulted in > significant duplicate message consumption. During last year's F1 Grand Prix > events and World Cup soccer events, our company's Kafka-based platform > received live-traffic. This live traffic, coupled with autoscaled consumers > resulted in as much as 70% duplicate message consumption at the Kafka > consumers. > In December 2022, we ran a synthetic load test to confirm that duplicate > message consumption occurs during consumer scale out/in and Kafka partition > rebalancing when using the Cooperative Sticky Assignor. This issue does not > occur when using the Range Assignor. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644055#comment-17644055 ] Haruki Okada commented on KAFKA-14445: -- [~kirktrue] Oh I was not aware of KAFKA-14317. Thanks. > Is there more involved in your patch that is not in the above PR No. However, as mentioned in KAFKA-10228, changing error type could be considered as breaking change so may need more discussions I guess. My plan was just requesting metadata update on REQUEST_TIMED_OUT response as well, without changing the error type so more trivial. > Producer doesn't request metadata update on REQUEST_TIMED_OUT > - > > Key: KAFKA-14445 > URL: https://issues.apache.org/jira/browse/KAFKA-14445 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Priority: Major > > Produce requests may fail with timeout by `request.timeout.ms` in below two > cases: > * Didn't receive produce response within `request.timeout.ms` > * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the > broker > Former case usually happens when a broker-machine got failed or there's > network glitch etc. > In this case, the connection will be disconnected and metadata-update will be > requested to discover new leader: > [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] > > The problem is in latter case (REQUEST_TIMED_OUT on the broker). > In this case, the produce request will be ended up with TimeoutException, > which doesn't inherit InvalidMetadataException so it doesn't trigger metadata > update. > > Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side > problem, that metadata-update doesn't make much sense indeed. > > However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT > could cause produce requests to retry unnecessarily , which may end up with > batch expiration due to delivery timeout. > Below is the scenario we experienced: > * Environment: > ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 > ** min.insync.replicas=2 > ** acks=all > * Scenario: > ** broker 1 "partially" failed > *** It lost ZooKeeper connection and kicked out from the cluster > There was controller log like: > * > {code:java} > [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , > deleted brokers: 1, bounced brokers: {code} > * > ** > *** However, somehow the broker was able continued to receive produce > requests > We're still working on investigating how this is possible though. > Indeed, broker 1 was somewhat "alive" and keeps working according to > server.log > *** In other words, broker 1 became "zombie" > ** broker 2 was elected as new leader > *** broker 3 became follower of broker 2 > *** However, since broker 1 was still out of cluster, it didn't receive > LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 > ** Meanwhile, producer keeps sending produce requests to broker 1 and > requests were failed due to REQUEST_TIMED_OUT because no brokers replicates > from broker 1. > *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't > have a change to update its stale metadata > > So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, > to address the case that the old leader became "zombie" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-14445: - Description: Produce requests may fail with timeout by `request.timeout.ms` in below two cases: * Didn't receive produce response within `request.timeout.ms` * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the broker Former case usually happens when a broker-machine got failed or there's network glitch etc. In this case, the connection will be disconnected and metadata-update will be requested to discover new leader: [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] The problem is in latter case (REQUEST_TIMED_OUT on the broker). In this case, the produce request will be ended up with TimeoutException, which doesn't inherit InvalidMetadataException so it doesn't trigger metadata update. Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side problem, that metadata-update doesn't make much sense indeed. However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could cause produce requests to retry unnecessarily , which may end up with batch expiration due to delivery timeout. Below is the scenario we experienced: * Environment: ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 ** min.insync.replicas=2 ** acks=all * Scenario: ** broker 1 "partially" failed *** It lost ZooKeeper connection and kicked out from the cluster There was controller log like: * {code:java} [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , deleted brokers: 1, bounced brokers: {code} * ** *** However, somehow the broker was able continued to receive produce requests We're still working on investigating how this is possible though. Indeed, broker 1 was somewhat "alive" and keeps working according to server.log *** In other words, broker 1 became "zombie" ** broker 2 was elected as new leader *** broker 3 became follower of broker 2 *** However, since broker 1 was still out of cluster, it didn't receive LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 1. *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have a change to update its stale metadata So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, to address the case that the old leader became "zombie" was: Produce requests may fail with timeout by `request.timeout.ms` in below two cases: * Didn't receive produce response within `request.timeout.ms` * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the broker Former case usually happens when a broker-machine got failed or there's network glitch etc. In this case, the connection will be disconnected and metadata-update will be requested to discover new leader: [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] The problem is in latter case (REQUEST_TIMED_OUT on the broker). In this case, the produce request will be ended up with TimeoutException, which doesn't inherit InvalidMetadataException so it doesn't trigger metadata update. Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side problem, that metadata-update doesn't make much sense indeed. However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could cause produce requests to retry unnecessarily , which may end up with batch expiration due to delivery timeout. Below is the scenario we experienced: * Environment: ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 ** min.insync.replicas=2 ** acks=all * Scenario: ** broker 1 "partially" failed *** It lost ZooKeeper connection and kicked out from the cluster There was controller log like: * {code:java} [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , deleted brokers: 1, bounced brokers: {code} * ** *** However, somehow the broker was able continued to receive produce requests We're still working on investigating how this is possible though. Indeed, broker 1 was somewhat "alive" and keeps working according to server.log *** In other words, broker 1 became "zombie" ** broker 2 was elected as new leader *** broker 3 became follower of broker 2 *** However, since broker 1 was still out of cluster, it didn't receive LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 1. *** REQUES
[jira] [Updated] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haruki Okada updated KAFKA-14445: - Description: Produce requests may fail with timeout by `request.timeout.ms` in below two cases: * Didn't receive produce response within `request.timeout.ms` * Produce response received, but it ended up with `REQUEST_TIMED_OUT` in the broker Former case usually happens when a broker-machine got failed or there's network glitch etc. In this case, the connection will be disconnected and metadata-update will be requested to discover new leader: [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] The problem is in latter case (REQUEST_TIMED_OUT on the broker). In this case, the produce request will be ended up with TimeoutException, which doesn't inherit InvalidMetadataException so it doesn't trigger metadata update. Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side problem, that metadata-update doesn't make much sense indeed. However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could cause produce requests to retry unnecessarily , which may end up with batch expiration due to delivery timeout. Below is the scenario we experienced: * Environment: ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 ** min.insync.replicas=2 ** acks=all * Scenario: ** broker 1 "partially" failed *** It lost ZooKeeper connection and kicked out from the cluster There was controller log like: * {code:java} [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , deleted brokers: 1, bounced brokers: {code} * ** *** However, somehow the broker was able continued to receive produce requests We're still working on investigating how this is possible though. Indeed, broker 1 was somewhat "alive" and keeps working according to server.log *** In other words, broker 1 became "zombie" ** broker 2 was elected as new leader *** broker 3 became follower of broker 2 *** However, since broker 1 was still out of cluster, it didn't receive LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 1. *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have a change to update its stale metadata So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, for the case that the old leader became "zombie" was: Produce requests may fail with timeout by `request.timeout.ms` in below two cases: * Didn't receive produce response within `request.timeout.ms` * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the broker Former case usually happens when a broker-machine got failed or there's network glitch etc. In this case, the connection will be disconnected and metadata-update will be requested to discover new leader: [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] The problem is in latter case (REQUEST_TIMED_OUT on the broker). In this case, the produce request will be ended up with TimeoutException, which doesn't inherit InvalidMetadataException so it doesn't trigger metadata update. Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side problem, that metadata-update doesn't make much sense indeed. However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could cause produce requests to retry unnecessarily , which may end up with batch expiration due to delivery timeout. Below is the scenario we experienced: * Environment: ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 ** min.insync.replicas=2 ** acks=all * Scenario: ** broker 1 "partially" failed *** It lost ZooKeeper connection and kicked out from the cluster There was controller log like: * {code:java} [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , deleted brokers: 1, bounced brokers: {code} *** However, somehow the broker was able continued to receive produce requests We're still working on investigating how this is possible though. Indeed, broker 1 was somewhat "alive" and keeps working according to server.log *** In other words, broker 1 became "zombie" ** broker 2 was elected as new leader *** broker 3 became follower of broker 2 *** However, since broker 1 was still out of cluster, it didn't receive LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 1. *** REQUEST_TIMED_OUT doe
[jira] [Commented] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
[ https://issues.apache.org/jira/browse/KAFKA-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643768#comment-17643768 ] Haruki Okada commented on KAFKA-14445: -- If the suggestion makes sense, we're happy to send a patch. > Producer doesn't request metadata update on REQUEST_TIMED_OUT > - > > Key: KAFKA-14445 > URL: https://issues.apache.org/jira/browse/KAFKA-14445 > Project: Kafka > Issue Type: Improvement >Reporter: Haruki Okada >Priority: Major > > Produce requests may fail with timeout by `request.timeout.ms` in below two > cases: > * Didn't receive produce response within `request.timeout.ms` > * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in > the broker > Former case usually happens when a broker-machine got failed or there's > network glitch etc. > In this case, the connection will be disconnected and metadata-update will be > requested to discover new leader: > [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] > > The problem is in latter case (REQUEST_TIMED_OUT on the broker). > In this case, the produce request will be ended up with TimeoutException, > which doesn't inherit InvalidMetadataException so it doesn't trigger metadata > update. > > Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side > problem, that metadata-update doesn't make much sense indeed. > > However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT > could cause produce requests to retry unnecessarily , which may end up with > batch expiration due to delivery timeout. > Below is the scenario we experienced: > * Environment: > ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 > ** min.insync.replicas=2 > ** acks=all > * Scenario: > ** broker 1 "partially" failed > *** It lost ZooKeeper connection and kicked out from the cluster > There was controller log like: > * > {code:java} > [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , > deleted brokers: 1, bounced brokers: {code} > *** However, somehow the broker was able continued to receive produce > requests > We're still working on investigating how this is possible though. > Indeed, broker 1 was somewhat "alive" and keeps working according to > server.log > *** In other words, broker 1 became "zombie" > ** broker 2 was elected as new leader > *** broker 3 became follower of broker 2 > *** However, since broker 1 was still out of cluster, it didn't receive > LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 > ** Meanwhile, producer keeps sending produce requests to broker 1 and > requests were failed due to REQUEST_TIMED_OUT because no brokers replicates > from broker 1. > *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't > have a change to update its stale metadata > > So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, > for the case that the old leader became "zombie" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT
Haruki Okada created KAFKA-14445: Summary: Producer doesn't request metadata update on REQUEST_TIMED_OUT Key: KAFKA-14445 URL: https://issues.apache.org/jira/browse/KAFKA-14445 Project: Kafka Issue Type: Improvement Reporter: Haruki Okada Produce requests may fail with timeout by `request.timeout.ms` in below two cases: * Didn't receive produce response within `request.timeout.ms` * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the broker Former case usually happens when a broker-machine got failed or there's network glitch etc. In this case, the connection will be disconnected and metadata-update will be requested to discover new leader: [https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556] The problem is in latter case (REQUEST_TIMED_OUT on the broker). In this case, the produce request will be ended up with TimeoutException, which doesn't inherit InvalidMetadataException so it doesn't trigger metadata update. Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side problem, that metadata-update doesn't make much sense indeed. However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could cause produce requests to retry unnecessarily , which may end up with batch expiration due to delivery timeout. Below is the scenario we experienced: * Environment: ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1 ** min.insync.replicas=2 ** acks=all * Scenario: ** broker 1 "partially" failed *** It lost ZooKeeper connection and kicked out from the cluster There was controller log like: * {code:java} [2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , deleted brokers: 1, bounced brokers: {code} *** However, somehow the broker was able continued to receive produce requests We're still working on investigating how this is possible though. Indeed, broker 1 was somewhat "alive" and keeps working according to server.log *** In other words, broker 1 became "zombie" ** broker 2 was elected as new leader *** broker 3 became follower of broker 2 *** However, since broker 1 was still out of cluster, it didn't receive LeaderAndIsr so 1 kept thinking itself as the leader of tp-0 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 1. *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have a change to update its stale metadata So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, for the case that the old leader became "zombie" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-13572) Negative value for 'Preferred Replica Imbalance' metric
[ https://issues.apache.org/jira/browse/KAFKA-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566872#comment-17566872 ] Haruki Okada commented on KAFKA-13572: -- We experienced similar phenomenon in our Kafka cluster and we found that following scenario can cause negative metric. Let's say there are topic-A, topic-B. # Initiate topic deletion of topic-A ** TopicDeletionManager#enqueueTopicsForDeletion is called with argument Set(topic-A) *** [https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/KafkaController.scala#L1771] # During topic-A's deletion procedure, topic-A's all partitions are marked as Offline (Leader = -1) ** [https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/ReplicaStateMachine.scala#L368] # Before topic-A's deletion procedure completes, initiate topic deletion of topic-B ** Since topic-A's ZK delete-topic node still exists, TopicDeletionManager#enqueueTopicsForDeletion is called with argument Set(topic-A, topic-B) ** ControllerContext#cleanPreferredReplicaImbalanceMetric is called for both topic-A, topic-B *** [https://github.com/apache/kafka/blob/3.2.0/core/src/main/scala/kafka/controller/ControllerContext.scala#L496] *** Since topic-A is now NoLeader, `!hasPreferredLeader(replicaAssignment, leadershipInfo)` evaluates to true, then `preferredReplicaImbalanceCount` is decremented unexpectedly > Negative value for 'Preferred Replica Imbalance' metric > --- > > Key: KAFKA-13572 > URL: https://issues.apache.org/jira/browse/KAFKA-13572 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Siddharth Ahuja >Priority: Major > Attachments: > kafka_negative_preferred-replica-imbalance-count_jmx_2.JPG > > > A negative value (-822) for the metric - > {{kafka_controller_kafkacontroller_preferredreplicaimbalancecount}} has been > observed - please see the attached screenshot and the output below: > {code:java} > $ curl -s http://localhost:9101/metrics | fgrep > 'kafka_controller_kafkacontroller_preferredreplicaimbalancecount' > # HELP kafka_controller_kafkacontroller_preferredreplicaimbalancecount > Attribute exposed for management (kafka.controller name=PreferredReplicaImbalanceCount><>Value) > # TYPE kafka_controller_kafkacontroller_preferredreplicaimbalancecount gauge > kafka_controller_kafkacontroller_preferredreplicaimbalancecount -822.0 > {code} > The issue has appeared after an operation where the number of partitions for > some topics were increased, and some topics were deleted/created in order to > decrease the number of their partitions. > Ran the following command to check if there is/are any instance/s where the > preferred leader (1st broker in the Replica list) is not the current Leader: > > {code:java} > % grep ".*Topic:.*Partition:.*Leader:.*Replicas:.*Isr:.*Offline:.*" > kafka-topics_describe.out | awk '{print $6 " " $8}' | cut -d "," -f1 | awk > '{print $0, ($1==$2?_:"NOT") "MATCHED"}'|grep NOT | wc -l > 0 > {code} > but could not find any such instances. > {{leader.imbalance.per.broker.percentage=2}} is set for all the brokers in > the cluster which means that we are allowed to have an imbalance of up to 2% > for preferred leaders. This seems to be a valid value, as such, this setting > should not contribute towards a negative metric. > The metric seems to be getting subtracted in the code > [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/ControllerContext.scala#L474-L503] > , however it is not clear when it can become -ve (i.e. subtracted more than > added) in absence of any comments or debug/trace level logs in the code. > However, one thing is for sure, you either have no imbalance (0) or have > imbalance (> 0), it doesn’t make sense for the metric to be < 0. > FWIW, no other anomalies besides this have been detected. > Considering these metrics get actively monitored, we should look at adding > DEBUG/TRACE logging around the addition/subtraction of these metrics (and > elsewhere where appropriate) to identify any potential issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion
[ https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527800#comment-17527800 ] Haruki Okada edited comment on KAFKA-13403 at 4/26/22 7:35 AM: --- [~showuon] Hi, could you help reviewing the PR [https://github.com/apache/kafka/pull/11438] ? -There seems to be another ticket likely due to the same cause: https://issues.apache.org/jira/browse/KAFKA-13855- After took another look at 13855, seems currently there's no clue to conclude it is the same cause. was (Author: ocadaruma): [~showuon] Hi, could you help reviewing the PR [https://github.com/apache/kafka/pull/11438] ? There seems to be another ticket likely due to the same cause: https://issues.apache.org/jira/browse/KAFKA-13855 > KafkaServer crashes when deleting topics due to the race in log deletion > > > Key: KAFKA-13403 > URL: https://issues.apache.org/jira/browse/KAFKA-13403 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.4.1 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > > h2. Environment > * OS: CentOS Linux release 7.6 > * Kafka version: 2.4.1 > * > ** But as far as I checked the code, I think same phenomenon could happen > even on trunk > * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs > is set) > * Java version: AdoptOpenJDK 1.8.0_282 > h2. Phenomenon > When we were in the middle of deleting several topics by `kafka-topics.sh > --delete --topic blah-blah`, one broker in our cluster crashed due to > following exception: > > {code:java} > [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in > /data/kafka have failed (kafka.log.LogManager) > {code} > > > We also found NoSuchFileException was thrown right before the crash when > LogManager tried to delete logs for some partitions. > > {code:java} > [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 > in dir /data/kafka (kafka.server.LogDirFailureChannel) > java.nio.file.NoSuchFileException: > /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > at java.nio.file.Files.readAttributes(Files.java:1737) > at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) > at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) > at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372) > at java.nio.file.Files.walkFileTree(Files.java:2706) > at java.nio.file.Files.walkFileTree(Files.java:2742) > at org.apache.kafka.common.utils.Utils.delete(Utils.java:732) > at kafka.log.Log.$anonfun$delete$2(Log.scala:2036) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at kafka.log.Log.maybeHandleIOException(Log.scala:2343) > at kafka.log.Log.delete(Log.scala:2030) > at kafka.log.LogManager.deleteLogs(LogManager.scala:826) > at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840) > at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > So, the log-dir was marked as offline and ended up with KafkaServer crash > because the broker has only single log-dir. > h2. Cause > We also found below logs right before the NoSuchFileException. > > {code:java} > [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is r
[jira] [Commented] (KAFKA-13855) FileNotFoundException: Error while rolling log segment for topic partition in dir
[ https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527961#comment-17527961 ] Haruki Okada commented on KAFKA-13855: -- H-mm sorry, sounds like I just overstepped. Yeah, seems we need to dig into this further. Please nevermind for now. > FileNotFoundException: Error while rolling log segment for topic partition in > dir > - > > Key: KAFKA-13855 > URL: https://issues.apache.org/jira/browse/KAFKA-13855 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.6.1 >Reporter: Sergey Ivanov >Priority: Major > > Hello, > We faced an issue when one of Kafka broker in cluster has failed with an > exception and restarted: > > {code:java} > [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] > Error while rolling log segment for prod_data_topic-7 in dir > /var/opt/kafka/data/1 > java.io.FileNotFoundException: > /var/opt/kafka/data/1/prod_data_topic-7/26872377.index (No such > file or directory) > at java.base/java.io.RandomAccessFile.open0(Native Method) > at java.base/java.io.RandomAccessFile.open(Unknown Source) > at java.base/java.io.RandomAccessFile.(Unknown Source) > at java.base/java.io.RandomAccessFile.(Unknown Source) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) > at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508) > at kafka.log.Log.$anonfun$roll$8(Log.scala:1916) > at kafka.log.Log.$anonfun$roll$2(Log.scala:1916) > at kafka.log.Log.roll(Log.scala:2349) > at kafka.log.Log.maybeRoll(Log.scala:1865) > at kafka.log.Log.$anonfun$append$2(Log.scala:1169) > at kafka.log.Log.append(Log.scala:2349) > at kafka.log.Log.appendAsLeader(Log.scala:1019) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273) > at > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at scala.collection.TraversableLike.map(TraversableLike.scala:273) > at scala.collection.TraversableLike.map$(TraversableLike.scala:266) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605) > at kafka.server.KafkaApis.handle(KafkaApis.scala:132) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70) > at java.base/java.lang.Thread.run(Unknown Source) > [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown > broker because all log dirs in /var/opt/kafka/data/1 have failed {code} > There are no any additional useful information in logs, just one warn before > this error: > {code:java} > [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] > [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions > __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30, > > prod_data_topic-0 and stopped moving logs for partitions because they are in > the failed log directory /var/opt/kafka/data/1. > [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping > serving logs in dir /var/opt/kafka/data/1{code} > The topic configuration is: > {code:java} > /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 > --describe --topic prod_data_topic > Topic: prod_data_topic PartitionCount: 12 ReplicationFactor: 3 > Configs: > min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296 > Topic: prod_data_topic Partition: 0 Leader: 3 > Replicas: 3,1,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 1 Leader: 1 > Replicas: 1,2,3 Isr: 3,2,1 > Topic:
[jira] [Commented] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion
[ https://issues.apache.org/jira/browse/KAFKA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527800#comment-17527800 ] Haruki Okada commented on KAFKA-13403: -- [~showuon] Hi, could you help reviewing the PR [https://github.com/apache/kafka/pull/11438] ? There seems to be another ticket likely due to the same cause: https://issues.apache.org/jira/browse/KAFKA-13855 > KafkaServer crashes when deleting topics due to the race in log deletion > > > Key: KAFKA-13403 > URL: https://issues.apache.org/jira/browse/KAFKA-13403 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.4.1 >Reporter: Haruki Okada >Assignee: Haruki Okada >Priority: Major > > h2. Environment > * OS: CentOS Linux release 7.6 > * Kafka version: 2.4.1 > * > ** But as far as I checked the code, I think same phenomenon could happen > even on trunk > * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs > is set) > * Java version: AdoptOpenJDK 1.8.0_282 > h2. Phenomenon > When we were in the middle of deleting several topics by `kafka-topics.sh > --delete --topic blah-blah`, one broker in our cluster crashed due to > following exception: > > {code:java} > [2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in > /data/kafka have failed (kafka.log.LogManager) > {code} > > > We also found NoSuchFileException was thrown right before the crash when > LogManager tried to delete logs for some partitions. > > {code:java} > [2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 > in dir /data/kafka (kafka.server.LogDirFailureChannel) > java.nio.file.NoSuchFileException: > /data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at > sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) > at > sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) > at java.nio.file.Files.readAttributes(Files.java:1737) > at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) > at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) > at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372) > at java.nio.file.Files.walkFileTree(Files.java:2706) > at java.nio.file.Files.walkFileTree(Files.java:2742) > at org.apache.kafka.common.utils.Utils.delete(Utils.java:732) > at kafka.log.Log.$anonfun$delete$2(Log.scala:2036) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at kafka.log.Log.maybeHandleIOException(Log.scala:2343) > at kafka.log.Log.delete(Log.scala:2030) > at kafka.log.LogManager.deleteLogs(LogManager.scala:826) > at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840) > at > kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > So, the log-dir was marked as offline and ended up with KafkaServer crash > because the broker has only single log-dir. > h2. Cause > We also found below logs right before the NoSuchFileException. > > {code:java} > [2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is renamed to > /data/kafka/foo-bar-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete and is > scheduled for deletion (kafka.log.LogManager) > [2021-10-21 18:18:17,900] INFO [Log partition=foo-bar-5, dir=/data/kafka] > Found deletable segments with base offsets [3877066] due to retention time > 17280ms breach (kafka.log.Log)[2021-10-21 18:18:17,901] INFO [Log > partition=foo-bar-5, dir=/data/ka
[jira] [Commented] (KAFKA-13855) FileNotFoundException: Error while rolling log segment for topic partition in dir
[ https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527797#comment-17527797 ] Haruki Okada commented on KAFKA-13855: -- I guess that's same cause as https://issues.apache.org/jira/browse/KAFKA-13403 > FileNotFoundException: Error while rolling log segment for topic partition in > dir > - > > Key: KAFKA-13855 > URL: https://issues.apache.org/jira/browse/KAFKA-13855 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 2.6.1 >Reporter: Sergey Ivanov >Priority: Major > > Hello, > We faced an issue when one of Kafka broker in cluster has failed with an > exception and restarted: > > {code:java} > [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] > Error while rolling log segment for prod_data_topic-7 in dir > /var/opt/kafka/data/1 > java.io.FileNotFoundException: > /var/opt/kafka/data/1/prod_data_topic-7/26872377.index (No such > file or directory) > at java.base/java.io.RandomAccessFile.open0(Native Method) > at java.base/java.io.RandomAccessFile.open(Unknown Source) > at java.base/java.io.RandomAccessFile.(Unknown Source) > at java.base/java.io.RandomAccessFile.(Unknown Source) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) > at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508) > at kafka.log.Log.$anonfun$roll$8(Log.scala:1916) > at kafka.log.Log.$anonfun$roll$2(Log.scala:1916) > at kafka.log.Log.roll(Log.scala:2349) > at kafka.log.Log.maybeRoll(Log.scala:1865) > at kafka.log.Log.$anonfun$append$2(Log.scala:1169) > at kafka.log.Log.append(Log.scala:2349) > at kafka.log.Log.appendAsLeader(Log.scala:1019) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273) > at > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at scala.collection.TraversableLike.map(TraversableLike.scala:273) > at scala.collection.TraversableLike.map$(TraversableLike.scala:266) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605) > at kafka.server.KafkaApis.handle(KafkaApis.scala:132) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70) > at java.base/java.lang.Thread.run(Unknown Source) > [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown > broker because all log dirs in /var/opt/kafka/data/1 have failed {code} > There are no any additional useful information in logs, just one warn before > this error: > {code:java} > [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] > [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions > __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30, > > prod_data_topic-0 and stopped moving logs for partitions because they are in > the failed log directory /var/opt/kafka/data/1. > [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping > serving logs in dir /var/opt/kafka/data/1{code} > The topic configuration is: > {code:java} > /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 > --describe --topic prod_data_topic > Topic: prod_data_topic PartitionCount: 12 ReplicationFactor: 3 > Configs: > min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296 > Topic: prod_data_topic Partition: 0 Leader: 3 > Replicas: 3,1,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 1 Leader: 1 > Replicas: 1,2,3 Isr: 3,2,1 > Topic: prod_data_topic Partition: 2
[jira] [Comment Edited] (KAFKA-10690) Produce-response delay caused by lagging replica fetch which affects in-sync one
[ https://issues.apache.org/jira/browse/KAFKA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503911#comment-17503911 ] Haruki Okada edited comment on KAFKA-10690 at 3/10/22, 12:14 AM: - Thanks for the comment. [~showuon] > Are you sure this issue is due to the `in-sync` replica fetch? Yeah, as long as replica fetch is `out-of-sync`, it doesn't block produce-request so the issue happens only on `in-sync` replica when `in-sync` replica fetching and `out-of-sync` replica fetching are done in same replica fetcher thread on follower side. > Could you have a PoC to add an additional thread pool for lagging replica to > confirm this solution? Haven't tried, as we wanted to confirm if anyone encounter similar issue or not (and if anyone addressed it in some way) first. But let us consider! [~junrao] > Have you tried enabling replication throttling? Yeah, we use replication throttling, and we suppose disk's performance itself is stable even on lagging-replica fetch. We use HDD, so reading the data takes few~tens of milliseconds per IO even it's stable. So if lagging replica fetch (likely not in page cache so causes disk reads) and in-sync replica fetch are done in same replica fetcher thread, in-sync one greatly affected by due to lagging one. was (Author: ocadaruma): Thanks for the comment. [~showuon] > Are you sure this issue is due to the `in-sync` replica fetch? Yeah, as long as replica fetch is `out-of-sync`, it doesn't block produce-request so the issue happens only on `in-sync` replica when `in-sync` replica fetching and `out-of-sync` replica fetching are done in same replica fetcher thread on follower side. > Could you have a PoC to add an additional thread pool for lagging replica to > confirm this solution? Haven't tried, as we wanted to confirm if anyone encounter similar issue or not (and if anyone addressed it in some way) first. But let us consider! [~junrao] > Have you tried enabling replication throttling? Yeah, we use replication throttling, and we suppose disk's performance itself is stable even on lagging-replica fetch. We use HDD, so reading the data takes few~tens of milliseconds per IO even it's stable. So if lagging replica fetch (likely not in page cache so causes disk reads) and in-sync replica fetch are done in same replica fetcher thread (i.e. in same Fetch request), in-sync one greatly affected by due to lagging one. > Produce-response delay caused by lagging replica fetch which affects in-sync > one > > > Key: KAFKA-10690 > URL: https://issues.apache.org/jira/browse/KAFKA-10690 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.4.1 >Reporter: Haruki Okada >Priority: Major > Attachments: image-2020-11-06-11-15-21-781.png, > image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png > > > h2. Our environment > * Kafka version: 2.4.1 > h2. Phenomenon > * Produce response time 99th (remote scope) degrades to 500ms, which is 20 > times worse than usual > ** Meanwhile, the cluster was running replica reassignment to service-in new > machine to recover replicas which held by failed (Hardware issue) broker > machine > !image-2020-11-06-11-15-21-781.png|width=292,height=166! > h2. Analysis > Let's say > * broker-X: The broker we observed produce latency degradation > * broker-Y: The broker under servicing-in > broker-Y was catching up replicas of partitions: > * partition-A: has relatively small log size > * partition-B: has large log size > (actually, broker-Y was catching-up many other partitions. I noted only two > partitions here to make explanation simple) > broker-X was the leader for both partition-A and partition-B. > We found that both partition-A and partition-B are assigned to same > ReplicaFetcherThread of broker-Y, and produce latency started to degrade > right after broker-Y finished catching up partition-A. > !image-2020-11-06-11-17-09-910.png|width=476,height=174! > Besides, we observed disk reads on broker-X during service-in. (This is > natural since old segments are likely not in page cache) > !image-2020-11-06-11-15-38-390.png|width=292,height=193! > So we suspected that: > * In-sync replica fetch (partition-A) was involved by lagging replica fetch > (partition-B), which should be slow because it causes actual disk reads > ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next > fetch request can't be sent until one fetch request completes > ** => Causes in-sync replica fetch for partitions assigned to same replica > fetcher thread to delay > ** => Causes remote scope produce latency degradation > h2. Possib
[jira] [Commented] (KAFKA-10690) Produce-response delay caused by lagging replica fetch which affects in-sync one
[ https://issues.apache.org/jira/browse/KAFKA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503911#comment-17503911 ] Haruki Okada commented on KAFKA-10690: -- Thanks for the comment. [~showuon] > Are you sure this issue is due to the `in-sync` replica fetch? Yeah, as long as replica fetch is `out-of-sync`, it doesn't block produce-request so the issue happens only on `in-sync` replica when `in-sync` replica fetching and `out-of-sync` replica fetching are done in same replica fetcher thread on follower side. > Could you have a PoC to add an additional thread pool for lagging replica to > confirm this solution? Haven't tried, as we wanted to confirm if anyone encounter similar issue or not (and if anyone addressed it in some way) first. But let us consider! [~junrao] > Have you tried enabling replication throttling? Yeah, we use replication throttling, and we suppose disk's performance itself is stable even on lagging-replica fetch. We use HDD, so reading the data takes few~tens of milliseconds per IO even it's stable. So if lagging replica fetch (likely not in page cache so causes disk reads) and in-sync replica fetch are done in same replica fetcher thread (i.e. in same Fetch request), in-sync one greatly affected by due to lagging one. > Produce-response delay caused by lagging replica fetch which affects in-sync > one > > > Key: KAFKA-10690 > URL: https://issues.apache.org/jira/browse/KAFKA-10690 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.4.1 >Reporter: Haruki Okada >Priority: Major > Attachments: image-2020-11-06-11-15-21-781.png, > image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png > > > h2. Our environment > * Kafka version: 2.4.1 > h2. Phenomenon > * Produce response time 99th (remote scope) degrades to 500ms, which is 20 > times worse than usual > ** Meanwhile, the cluster was running replica reassignment to service-in new > machine to recover replicas which held by failed (Hardware issue) broker > machine > !image-2020-11-06-11-15-21-781.png|width=292,height=166! > h2. Analysis > Let's say > * broker-X: The broker we observed produce latency degradation > * broker-Y: The broker under servicing-in > broker-Y was catching up replicas of partitions: > * partition-A: has relatively small log size > * partition-B: has large log size > (actually, broker-Y was catching-up many other partitions. I noted only two > partitions here to make explanation simple) > broker-X was the leader for both partition-A and partition-B. > We found that both partition-A and partition-B are assigned to same > ReplicaFetcherThread of broker-Y, and produce latency started to degrade > right after broker-Y finished catching up partition-A. > !image-2020-11-06-11-17-09-910.png|width=476,height=174! > Besides, we observed disk reads on broker-X during service-in. (This is > natural since old segments are likely not in page cache) > !image-2020-11-06-11-15-38-390.png|width=292,height=193! > So we suspected that: > * In-sync replica fetch (partition-A) was involved by lagging replica fetch > (partition-B), which should be slow because it causes actual disk reads > ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next > fetch request can't be sent until one fetch request completes > ** => Causes in-sync replica fetch for partitions assigned to same replica > fetcher thread to delay > ** => Causes remote scope produce latency degradation > h2. Possible fix > We think this issue can be addressed by designating part of > ReplicaFetcherThread (or creating another thread pool) for lagging replica > catching-up, but not so sure this is the appropriate way. > Please give your opinions about this issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)