[jira] [Commented] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696794#comment-17696794 ] fujian commented on KAFKA-14768: Hi [~showuon]: I summary the two issues/solutions and create two KIPs related with this Jira for your review. Thanks. [KIP-912: Support decreasing send's max block time without worrying about metadata's fetch - Apache Kafka - Apache Software Foundation|https://cwiki.apache.org/confluence/display/KAFKA/KIP-912%3A+Support+decreasing+send%27s+max+block+time+without+worrying+about+metadata%27s+fetch] [KIP-913: add new method to provide possibility for accelerate first record's sending - Apache Kafka - Apache Software Foundation|https://cwiki.apache.org/confluence/display/KAFKA/KIP-913%3A+add+new+method+to+provide+possibility+for+accelerate+first+record%27s+sending] Regards Jian > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it may break the define for the max.block.ms a little. what's > more, it only solves issue 2 instead of issue1. > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. I hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Description: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka|https://github.com/apache/kafka/pull/13320]) _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) __ after the change, we can call it before the service is marked as ready. After the ready. it won't block to get metadata due to cache. And then we can be safe to reduce the max.block.ms to a lower value to reduce thread's blocking time. After adopting the solution 3. we solve the above issues. For example, we reduce the first message's send about 4s seconds. The log can refer to followed: _warmup test_topic at phase phase 2: get metadata from mq start_ _warmup test_topic at phase phase 2: get metadata from mq end consume *4669ms*_ And after the change, we reduce the max.block.ms from 10s to 2s without worry can't get metadata. {*}So what's your thought for these two issues and the solution I proposed{*}. I hope to get your feedback and thought for the issues. was: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka|https://github.com/apache/kafka/pull/13320]) _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) __ after the change, we can call it before the service is marked as ready. After the ready. it won't block to get metadata due to cache. And then we can be safe to reduce the max.block.ms to a lower value to reduce thread's blocking time. After adopting the solution 3. we solve the above issues. For example, we reduce the first message's send about 4s seconds. The log can refer to followed: _warmup test_topic at phase phase 2: get metadata from mq start_ _warmup test_topic at phase phase 2: get metadata from mq end consume *4669ms*_ And after the change, we reduce the max.block.ms from 10s to 2s without worry
[jira] [Comment Edited] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696089#comment-17696089 ] fujian edited comment on KAFKA-14768 at 3/3/23 9:13 AM: Hi [~showuon]: I create another PR : [KAFKA-14768: add new configure to reduce the max.block.ms safely by jiafu1115 · Pull Request #13335 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13335/files] for your reference. The PR is the solution 2 I mentioned. Thought it can't solve the issue 1, but it can solve issue 2. WDTY? which one is better? I think combining two of them are better one which can solve all of the issues. but I think it is also ok for one by one. (Maybe I should create two KIP for two issues to make it clear) Thanks was (Author: fujian1115): Hi [~showuon]: I create another PR : [KAFKA-14768: add new configure to reduce the max.block.ms safely by jiafu1115 · Pull Request #13335 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13335/files] for your reference. The PR is the solution 2 I mentioned. Thought it can't solve the issue 1, but it can solve issue 2. WDTY? which one is better? I think combining two of them are better one which can solve all of the issues. but I think it is also ok for one by one. Thanks > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq
[jira] [Comment Edited] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695956#comment-17695956 ] fujian edited comment on KAFKA-14768 at 3/3/23 8:56 AM: Hi [~showuon] Thanks for your feedback. For 1, send one "warm up" message : the solution is to send one fake business record/message before real business records. it will trigger the metadata's fetching. I think it can solve the issue , but it is bad solution which involve one fake message. So I don't adopt it. For 2, Provide one extra time's configure which be dedicated for getting metadata. thus it will break the definition for the max.block.ms a little and don't solve the issue 1. So I don't adopt it. but I can write one PR showing my thought for your reference. Maybe it is part of the whole solution. For 3,my proposal is this one: add one method to *provide possibility* to call waitOnMetadata with one timeout setting without using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first record's sending and reduce the max.block.ms safely by jiafu1115 · Pull Request #13320 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13320]) I will check the KIP for detail and give you one feedback ASAP. Thanks. was (Author: fujian1115): Hi [~showuon] Thanks for your feedback. For 1, send one "warm up" message : the solution is to send one fake business record/message before real business records. it will trigger the metadata's fetching. I think it can solve the issue , but it is bad solution which involve one fake message. So I don't adopt it. For 2, Provide one extra time's configure which be dedicated for getting metadata. thus it will break the definition for the max.block.ms a little. So I don't adopt it. For 3,my proposal is this one: add one method to *provide possibility* to call waitOnMetadata with one timeout setting without using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first record's sending and reduce the max.block.ms safely by jiafu1115 · Pull Request #13320 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13320]) I will check the KIP for detail and give you one feedback ASAP. Thanks. > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768:
[jira] [Commented] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696089#comment-17696089 ] fujian commented on KAFKA-14768: Hi [~showuon]: I create another PR : [KAFKA-14768: add new configure to reduce the max.block.ms safely by jiafu1115 · Pull Request #13335 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13335/files] for your reference. The PR is the solution 2 I mentioned. Thought it can't solve the issue 1, but it can solve issue 2. WDTY? which one is better? I think combining two of them are better one which can solve all of the issues. but I think it is also ok for one by one. Thanks > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. I hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695962#comment-17695962 ] fujian edited comment on KAFKA-14768 at 3/3/23 2:47 AM: Hi [~showuon] I checked the KIP and related JIRA. I think the KIP is trying to solve the issue by avoid blocking on metadata fetching. In my opinion, it isn't the right direction: (1) first record's sending real need the metadata (2) even we solve the metadata fetching issue. The send still isn't a pure async API. It still block on some others operations such as append. So based on current status. I think we had no method to solve it. What's more. the unasynced API provide the back press's benefit. So, it is not worth to take the complex proposal to solve it. I will create one new KIP for these related issues. Thanks Regards Jian was (Author: fujian1115): Hi [~showuon] I checked the KIP and related JIRA. I think the KIP is trying to solve the issue by avoid blocking on metadata fetching. In my opinion, it isn't the right direction: (1) first record's sending real need the metadata (2) even we solve the metadata fetching issue. The send still isn't a synced API. It still block on some others operations such as append. So based on current status. I think we had no any method to solve it. What's more. the unasynced api provide the back press's benefit. So it is not worth to take the complex proposal to solve it. I will create one new KIP for these related issues. Thanks Regards Jian > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For
[jira] [Commented] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695962#comment-17695962 ] fujian commented on KAFKA-14768: Hi [~showuon] I checked the KIP and related JIRA. I think the KIP is trying to solve the issue by avoid blocking on metadata fetching. In my opinion, it isn't the right direction: (1) first record's sending real need the metadata (2) even we solve the metadata fetching issue. The send still isn't a synced API. It still block on some others operations such as append. So based on current status. I think we had no any method to solve it. What's more. the unasynced api provide the back press's benefit. So it is not worth to take the complex proposal to solve it. I will create one new KIP for these related issues. Thanks Regards Jian > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. I hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695956#comment-17695956 ] fujian commented on KAFKA-14768: Hi [~showuon] Thanks for your feedback. For 1, send one "warm up" message : the solution is to send one fake business record/message before real business records. it will trigger the metadata's fetching. I think it can solve the issue , but it is bad solution which involve one fake message. So I don't adopt it. For 2, Provide one extra time's configure which be dedicated for getting metadata. thus it will break the definition for the max.block.ms a little. So I don't adopt it. For 3,my proposal is this one: add one method to *provide possibility* to call waitOnMetadata with one timeout setting without using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first record's sending and reduce the max.block.ms safely by jiafu1115 · Pull Request #13320 · apache/kafka (github.com)|https://github.com/apache/kafka/pull/13320]) I will check the KIP for detail and give you one feedback ASAP. Thanks. > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: needs-kip, performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first > record's sending and reduce the max.block.ms safely by jiafu1115 · Pull > Request #13320 · apache/kafka > (github.com)|https://github.com/apache/kafka/pull/13320]) > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}.
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Description: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka|https://github.com/apache/kafka/pull/13320]) _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) __ after the change, we can call it before the service is marked as ready. After the ready. it won't block to get metadata due to cache. And then we can be safe to reduce the max.block.ms to a lower value to reduce thread's blocking time. After adopting the solution 3. we solve the above issues. For example, we reduce the first message's send about 4s seconds. The log can refer to followed: _warmup test_topic at phase phase 2: get metadata from mq start_ _warmup test_topic at phase phase 2: get metadata from mq end consume *4669ms*_ And after the change, we reduce the max.block.ms from 10s to 2s without worry can't get metadata. {*}So what's your thought for these two issues and the solution I proposed{*}. I hope to get your feedback and thought for the issues. was: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka| proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Description: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka| proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) add one method to call waitOnMetadata with one timeout setting without > using the max.block.ms > > _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_ > ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long > nowMs, long maxWaitMs) > > __ > after the change, we can call it before the service is marked as ready. After > the ready. it won't block to get metadata due to cache. And then we can be > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopting the solution 3. we solve the above issues. For example, we > reduce the first message's send about 4s seconds. The log can refer to > followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Labels: performance (was: ) > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > Labels: performance > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > after the change, we can call it before the service is marked as ready. > After the ready. it won't block to get metadata due to cache. And then we can > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Affects Version/s: 3.3.2 > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1, 3.3.2 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > after the change, we can call it before the service is marked as ready. > After the ready. it won't block to get metadata due to cache. And then we can > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Priority: Major (was: Minor) > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Major > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > after the change, we can call it before the service is marked as ready. > After the ready. it won't block to get metadata due to cache. And then we can > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Description: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka| proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Minor > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > after the change, we can call it before the service is marked as ready. > After the ready. it won't block to get metadata due to cache. And then we can > safe to reduce the max.block.ms to a lower value to reduce thread's blocking > time. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Description: Hi, Team: Nice to meet you! In our business, we found two types of issue which need to improve: *(1) Take much time to send the first message* Sometimes, we found the users' functional interaction take a lot of time. At last, we figure out the root cause is that after we complete deploy or restart the servers. The first message's delivery on each application server by kafka client will take much time. So, we try to find one solution to improve it. After analyzing the source code about the first time's sending logic. The time cost is caused by the getting metadata before the sending. The latter's sending won't take the much time due to the cached metadata. The logic is right and necessary. Thus, we still want to improve the experience for the first message's send/user first interaction. *(2) can't reduce the send message's block time to wanted value.* Sometimes our application's thread will block for max.block.ms to send message. When we try to reduce the max.block.ms to reduce the blocking time. It can't meet the getting metadata's time requirement sometimes. The root cause is the configured max.block.ms is shared with "get metadata" operation and "send message" operation. We can refer to follow tables: |*where to block* |*when it is blocked* |*how long it will be blocked?* | |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first request which need to load the metadata from kafka|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage], kafka introduced the feature of hierarchical storage. Also, [KAFKA-9555] Topic-based implementation for the RemoteLogMetadataManager - ASF JIRA (apache.org) implements the default RLMM - 'TopicBased-RLMM'. {*}Problem{*}: TopicBased-RLMM will only subscribe to the Partitions where the current Broker is Leader or Follower. If the current Broker is not the Leader or Follower, then RLMM will directly skip the metadata records related to these Partitions. When reassign user-partitions occurs, rlmm will subscribe to new user-partitions, assuming that the metadata-partition to which the new user-partition belongs is 'metadata-partition0', and RLMM has consumed 'metadata-partition0' *to offset = 100* before the reassign partition occurs, then {*}after reassign{*}, RMLM will *not* consume 'metadata-partition0' \{*}from the beginning{*}, and finally cause the metadata records related to the new user-partition to *be lost with offset < 100.* *Solution* Let RLMM subscribe to all user-patitions, instead of only subscribing to partitions where the current broker is leader or follower. In this way, when reassign partition occurs, RLMM will have new partition's metadata records. > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Blocker > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka|
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Component/s: clients (was: core) > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Minor > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > > so that we can call it before the service is marked as ready. After the > ready. it won't block to get metadata due to cache. And then we can reduce > the max.block.ms to a lower value to reduce thread's blocking time for thread. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Priority: Minor (was: Blocker) > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Minor > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > > so that we can call it before the service is marked as ready. After the > ready. it won't block to get metadata due to cache. And then we can reduce > the max.block.ms to a lower value to reduce thread's blocking time for thread. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
[ https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fujian updated KAFKA-14768: --- Reviewer: (was: Satish Duggana) > proposal to reduce the first message's send time cost and max block time for > safety > > > Key: KAFKA-14768 > URL: https://issues.apache.org/jira/browse/KAFKA-14768 > Project: Kafka > Issue Type: Improvement > Components: clients >Affects Versions: 3.3.1 >Reporter: fujian >Assignee: hzh0425 >Priority: Minor > > Hi, Team: > > Nice to meet you! > > In our business, we found two types of issue which need to improve: > > *(1) Take much time to send the first message* > Sometimes, we found the users' functional interaction take a lot of time. At > last, we figure out the root cause is that after we complete deploy or > restart the servers. The first message's delivery on each application server > by kafka client will take much time. > So, we try to find one solution to improve it. > > After analyzing the source code about the first time's sending logic. The > time cost is caused by the getting metadata before the sending. The latter's > sending won't take the much time due to the cached metadata. The logic is > right and necessary. Thus, we still want to improve the experience for the > first message's send/user first interaction. > > *(2) can't reduce the send message's block time to wanted value.* > Sometimes our application's thread will block for max.block.ms to send > message. When we try to reduce the max.block.ms to reduce the blocking time. > It can't meet the getting metadata's time requirement sometimes. The root > cause is the configured max.block.ms is shared with "get metadata" operation > and "send message" operation. We can refer to follow tables: > |*where to block* > |*when it is blocked* > |*how long it will be blocked?* > | > |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first > request which need to load the metadata from kafka| |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak > time for business, if the network can’t send message in short > time.| > What's the solution for the above two issues: > I think about current logic and figure out followed possible solution: > (1) send one "warmup" message, thus we can't send any fake message. > (2) provide one extra configure time configure which dedicated for getting > metadata. thus it will break the define for the max.block.ms > (3) change the private to public for the method or provide dedicated method > for this support. > _private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, > long nowMs, long maxWaitMs)_ > > so that we can call it before the service is marked as ready. After the > ready. it won't block to get metadata due to cache. And then we can reduce > the max.block.ms to a lower value to reduce thread's blocking time for thread. > > After adopt the solution 3. we solve the above issues. For example, we reduce > the first message's send about 4s seconds. The log can refer to followed: > _warmup test_topic at phase phase 2: get metadata from mq start_ > _warmup test_topic at phase phase 2: get metadata from mq end consume > *4669ms*_ > And after the change, we reduce the max.block.ms from 10s to 2s without worry > can't get metadata. > > {*}So what's your thought for these two issues and the solution I > proposed{*}. If there is no problem for it. I can create one PR to merge. I > hope to get your feedback and thought for the issues. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14768) proposal to reduce the first message's send time cost and max block time for safety
fujian created KAFKA-14768: -- Summary: proposal to reduce the first message's send time cost and max block time for safety Key: KAFKA-14768 URL: https://issues.apache.org/jira/browse/KAFKA-14768 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 3.3.1 Reporter: fujian Assignee: hzh0425 {*}Background{*}: In [KIP-405: Kafka Tiered Storage - Apache Kafka - Apache Software Foundation|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage], kafka introduced the feature of hierarchical storage. Also, [KAFKA-9555] Topic-based implementation for the RemoteLogMetadataManager - ASF JIRA (apache.org) implements the default RLMM - 'TopicBased-RLMM'. {*}Problem{*}: TopicBased-RLMM will only subscribe to the Partitions where the current Broker is Leader or Follower. If the current Broker is not the Leader or Follower, then RLMM will directly skip the metadata records related to these Partitions. When reassign user-partitions occurs, rlmm will subscribe to new user-partitions, assuming that the metadata-partition to which the new user-partition belongs is 'metadata-partition0', and RLMM has consumed 'metadata-partition0' *to offset = 100* before the reassign partition occurs, then {*}after reassign{*}, RMLM will *not* consume 'metadata-partition0' \{*}from the beginning{*}, and finally cause the metadata records related to the new user-partition to *be lost with offset < 100.* *Solution* Let RLMM subscribe to all user-patitions, instead of only subscribing to partitions where the current broker is leader or follower. In this way, when reassign partition occurs, RLMM will have new partition's metadata records. -- This message was sent by Atlassian Jira (v8.20.10#820010)