Re: Use one producer for both coordinator stream and users system?
Hi Tao, First, one kafka producer has an i/o thread. (correct me if I am wrong). Second, after Samza 0.10.0, we have a coordinator stream, which stores the checkpoint, config and other locality information for auto-scaling, dynamic configuration, etc purpose. (See Samza-348 https://issues.apache.org/jira/browse/SAMZA-348). So we have a producer for this coordinator stream. Therefore, each contains will have at least two producers, one is for the coordinator stream, one is for the users system. My question is, can we use only one producer for both coordinator stream and the users system to have better performance? (from the doc, it may retrieve better performance.) Thanks, Fang, Yan yanfang...@gmail.com On Mon, Aug 17, 2015 at 9:49 PM, Tao Feng fengta...@gmail.com wrote: Hi Yan, Naive question: what do we need producer thread of coordinator stream for? Thanks, -Tao On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, I have this question because Kafka's doc http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html seems recommending having one producer shared by all threads (*The producer is thread safe and should generally be shared among all threads for best performance.*), while currently the coordinator stream is using a separate producer (usually, there are two producers(two producer threads) in each container: one is for the coordinator stream , one is for the real job) 1. Will having one producer shared by all thread really improve the performance? (haven't done the perf test myself. Guess Kafka has some proof). 2. if yes, should we go this way? Thanks, Fang, Yan yanfang...@gmail.com
Re: Use one producer for both coordinator stream and users system?
Thanks Yan. I guess I am not very clear with the coordinatorStream concept before. -Tao On Tue, Aug 18, 2015 at 12:26 AM, Yan Fang yanfang...@gmail.com wrote: Hi Tao, First, one kafka producer has an i/o thread. (correct me if I am wrong). Second, after Samza 0.10.0, we have a coordinator stream, which stores the checkpoint, config and other locality information for auto-scaling, dynamic configuration, etc purpose. (See Samza-348 https://issues.apache.org/jira/browse/SAMZA-348). So we have a producer for this coordinator stream. Therefore, each contains will have at least two producers, one is for the coordinator stream, one is for the users system. My question is, can we use only one producer for both coordinator stream and the users system to have better performance? (from the doc, it may retrieve better performance.) Thanks, Fang, Yan yanfang...@gmail.com On Mon, Aug 17, 2015 at 9:49 PM, Tao Feng fengta...@gmail.com wrote: Hi Yan, Naive question: what do we need producer thread of coordinator stream for? Thanks, -Tao On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, I have this question because Kafka's doc http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html seems recommending having one producer shared by all threads (*The producer is thread safe and should generally be shared among all threads for best performance.*), while currently the coordinator stream is using a separate producer (usually, there are two producers(two producer threads) in each container: one is for the coordinator stream , one is for the real job) 1. Will having one producer shared by all thread really improve the performance? (haven't done the perf test myself. Guess Kafka has some proof). 2. if yes, should we go this way? Thanks, Fang, Yan yanfang...@gmail.com
Re: Use one producer for both coordinator stream and users system?
Hi Yan, My (uneducated) guess is that the performance gains come from batching. I don't know if the new producer ever batches by destination broker. If not and it only batches by (broker,topic,partition) then I doubt that one vs two producers will affect performance as they send to different topics. Cheers, Roger On Tue, Aug 18, 2015 at 12:26 AM, Yan Fang yanfang...@gmail.com wrote: Hi Tao, First, one kafka producer has an i/o thread. (correct me if I am wrong). Second, after Samza 0.10.0, we have a coordinator stream, which stores the checkpoint, config and other locality information for auto-scaling, dynamic configuration, etc purpose. (See Samza-348 https://issues.apache.org/jira/browse/SAMZA-348). So we have a producer for this coordinator stream. Therefore, each contains will have at least two producers, one is for the coordinator stream, one is for the users system. My question is, can we use only one producer for both coordinator stream and the users system to have better performance? (from the doc, it may retrieve better performance.) Thanks, Fang, Yan yanfang...@gmail.com On Mon, Aug 17, 2015 at 9:49 PM, Tao Feng fengta...@gmail.com wrote: Hi Yan, Naive question: what do we need producer thread of coordinator stream for? Thanks, -Tao On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, I have this question because Kafka's doc http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html seems recommending having one producer shared by all threads (*The producer is thread safe and should generally be shared among all threads for best performance.*), while currently the coordinator stream is using a separate producer (usually, there are two producers(two producer threads) in each container: one is for the coordinator stream , one is for the real job) 1. Will having one producer shared by all thread really improve the performance? (haven't done the perf test myself. Guess Kafka has some proof). 2. if yes, should we go this way? Thanks, Fang, Yan yanfang...@gmail.com
Re: Use one producer for both coordinator stream and users system?
Hi Yan, Naive question: what do we need producer thread of coordinator stream for? Thanks, -Tao On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote: Hi guys, I have this question because Kafka's doc http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html seems recommending having one producer shared by all threads (*The producer is thread safe and should generally be shared among all threads for best performance.*), while currently the coordinator stream is using a separate producer (usually, there are two producers(two producer threads) in each container: one is for the coordinator stream , one is for the real job) 1. Will having one producer shared by all thread really improve the performance? (haven't done the perf test myself. Guess Kafka has some proof). 2. if yes, should we go this way? Thanks, Fang, Yan yanfang...@gmail.com