Re: Measuring Samza Job Throughput
Hi, Milinda, Yi, Sure. I will be happy to help on this. Thanks, -Tao On Wed, Jun 17, 2015 at 11:35 AM, Yi Pan nickpa...@gmail.com wrote: Hi, Milinda, Tao @LinkedIn has done some Samza benchmark test using a standard word-count task. You may want to reach out to him for some detailed ideas on how to set up the perf tests. Best! -Yi On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Thank you all for the ideas. I'll have a look at KafkaSystem metrics and SamzaContainerMetrics. Milinda On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote: Hi, One metric I could think of related to Samza job throughput is the process-envelop metric listed in SamzaContainerMetrics. This counter get incremented whenever the container process meaningful message( https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala ). But this metric is more like a QPS type of metric . Thanks, -Tao On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: Measuring Samza Job Throughput
Hi, Milinda, Tao @LinkedIn has done some Samza benchmark test using a standard word-count task. You may want to reach out to him for some detailed ideas on how to set up the perf tests. Best! -Yi On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu wrote: Thank you all for the ideas. I'll have a look at KafkaSystem metrics and SamzaContainerMetrics. Milinda On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote: Hi, One metric I could think of related to Samza job throughput is the process-envelop metric listed in SamzaContainerMetrics. This counter get incremented whenever the container process meaningful message( https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala ). But this metric is more like a QPS type of metric . Thanks, -Tao On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: Measuring Samza Job Throughput
Thank you all for the ideas. I'll have a look at KafkaSystem metrics and SamzaContainerMetrics. Milinda On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote: Hi, One metric I could think of related to Samza job throughput is the process-envelop metric listed in SamzaContainerMetrics. This counter get incremented whenever the container process meaningful message( https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala ). But this metric is more like a QPS type of metric . Thanks, -Tao On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: Measuring Samza Job Throughput
Hi, One metric I could think of related to Samza job throughput is the process-envelop metric listed in SamzaContainerMetrics. This counter get incremented whenever the container process meaningful message( https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala ). But this metric is more like a QPS type of metric . Thanks, -Tao On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: Measuring Samza Job Throughput
Hey Milinda, Specifically, for bytes/sec, you might want to look at serde metrics. I believe the serde manager tracks bytes serialized and deserialized per second. The consumers and producers also do this for Kafka, but on a more granular basis. If you want container-level throughput, serde manager is worth looking at. Cheers, Chris On Tuesday, June 16, 2015, Milinda Pathirage mpath...@umail.iu.edu wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: Measuring Samza Job Throughput
Hmm, correction. I think this has to be done at tbhe KafkaSystem level. We allow consumers and producers to return non-byte messages, which means nothing in container can safely assume that a message is a byte array except the serde manager. I took a look there but didn't see any byte throughout metrics after all. On Tuesday, June 16, 2015, Chris Riccomini criccom...@apache.org wrote: Hey Milinda, Specifically, for bytes/sec, you might want to look at serde metrics. I believe the serde manager tracks bytes serialized and deserialized per second. The consumers and producers also do this for Kafka, but on a more granular basis. If you want container-level throughput, serde manager is worth looking at. Cheers, Chris On Tuesday, June 16, 2015, Milinda Pathirage mpath...@umail.iu.edu javascript:_e(%7B%7D,'cvml','mpath...@umail.iu.edu'); wrote: Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Measuring Samza Job Throughput
Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org