Re: Measuring Samza Job Throughput

2015-06-18 Thread Tao Feng
Hi, Milinda, Yi,

Sure. I will be happy to help on this.

Thanks,
-Tao

On Wed, Jun 17, 2015 at 11:35 AM, Yi Pan nickpa...@gmail.com wrote:

 Hi, Milinda,

 Tao @LinkedIn has done some Samza benchmark test using a standard
 word-count task. You may want to reach out to him for some detailed ideas
 on how to set up the perf tests.

 Best!

 -Yi

 On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu
 
 wrote:

  Thank you all for the ideas. I'll have a look at KafkaSystem metrics and
  SamzaContainerMetrics.
 
  Milinda
 
  On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote:
 
   Hi,
  
   One metric I could think of related to Samza job throughput is the
   process-envelop metric listed in SamzaContainerMetrics. This counter
   get incremented whenever the container process meaningful message(
  
  
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
   
  
  
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
   ).
  
   But this metric is more like a QPS type of metric .
  
   Thanks,
   -Tao
  
   On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage 
  mpath...@umail.iu.edu
   wrote:
  
Hi Devs,
   
I was looking for a way to measure Samza job throughput and found
 that
   its
possible to do it via Samza's metrics reporter. But there several
 types
   of
metrics reported via this method. For example, TaskInstanceMetrics
   reports
number of messages sent. But if I wanted to get a measurement like
  bytes
per second produced, is there a way to do that. It looks
like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
   number
of messages sent.
   
If any of you have any experience in measuring Samza job throughput,
  can
you please share. Really appreciate any ideas on measuring job
   throughput.
   
Thanks
Milinda
--
Milinda Pathirage
   
PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University
   
twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org
   
  
 
 
 
  --
  Milinda Pathirage
 
  PhD Student | Research Assistant
  School of Informatics and Computing | Data to Insight Center
  Indiana University
 
  twitter: milindalakmal
  skype: milinda.pathirage
  blog: http://milinda.pathirage.org
 



Re: Measuring Samza Job Throughput

2015-06-17 Thread Yi Pan
Hi, Milinda,

Tao @LinkedIn has done some Samza benchmark test using a standard
word-count task. You may want to reach out to him for some detailed ideas
on how to set up the perf tests.

Best!

-Yi

On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu
wrote:

 Thank you all for the ideas. I'll have a look at KafkaSystem metrics and
 SamzaContainerMetrics.

 Milinda

 On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote:

  Hi,
 
  One metric I could think of related to Samza job throughput is the
  process-envelop metric listed in SamzaContainerMetrics. This counter
  get incremented whenever the container process meaningful message(
 
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
  
 
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
  ).
 
  But this metric is more like a QPS type of metric .
 
  Thanks,
  -Tao
 
  On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage 
 mpath...@umail.iu.edu
  wrote:
 
   Hi Devs,
  
   I was looking for a way to measure Samza job throughput and found that
  its
   possible to do it via Samza's metrics reporter. But there several types
  of
   metrics reported via this method. For example, TaskInstanceMetrics
  reports
   number of messages sent. But if I wanted to get a measurement like
 bytes
   per second produced, is there a way to do that. It looks
   like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
  number
   of messages sent.
  
   If any of you have any experience in measuring Samza job throughput,
 can
   you please share. Really appreciate any ideas on measuring job
  throughput.
  
   Thanks
   Milinda
   --
   Milinda Pathirage
  
   PhD Student | Research Assistant
   School of Informatics and Computing | Data to Insight Center
   Indiana University
  
   twitter: milindalakmal
   skype: milinda.pathirage
   blog: http://milinda.pathirage.org
  
 



 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org



Re: Measuring Samza Job Throughput

2015-06-17 Thread Milinda Pathirage
Thank you all for the ideas. I'll have a look at KafkaSystem metrics and
SamzaContainerMetrics.

Milinda

On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote:

 Hi,

 One metric I could think of related to Samza job throughput is the
 process-envelop metric listed in SamzaContainerMetrics. This counter
 get incremented whenever the container process meaningful message(

 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
 

 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
 ).

 But this metric is more like a QPS type of metric .

 Thanks,
 -Tao

 On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu
 wrote:

  Hi Devs,
 
  I was looking for a way to measure Samza job throughput and found that
 its
  possible to do it via Samza's metrics reporter. But there several types
 of
  metrics reported via this method. For example, TaskInstanceMetrics
 reports
  number of messages sent. But if I wanted to get a measurement like bytes
  per second produced, is there a way to do that. It looks
  like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
 number
  of messages sent.
 
  If any of you have any experience in measuring Samza job throughput, can
  you please share. Really appreciate any ideas on measuring job
 throughput.
 
  Thanks
  Milinda
  --
  Milinda Pathirage
 
  PhD Student | Research Assistant
  School of Informatics and Computing | Data to Insight Center
  Indiana University
 
  twitter: milindalakmal
  skype: milinda.pathirage
  blog: http://milinda.pathirage.org
 




-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


Re: Measuring Samza Job Throughput

2015-06-17 Thread Tao Feng
Hi,

One metric I could think of related to Samza job throughput is the
process-envelop metric listed in SamzaContainerMetrics. This counter
get incremented whenever the container process meaningful message(
https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala

https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
).

But this metric is more like a QPS type of metric .

Thanks,
-Tao

On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu
wrote:

 Hi Devs,

 I was looking for a way to measure Samza job throughput and found that its
 possible to do it via Samza's metrics reporter. But there several types of
 metrics reported via this method. For example, TaskInstanceMetrics reports
 number of messages sent. But if I wanted to get a measurement like bytes
 per second produced, is there a way to do that. It looks
 like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
 of messages sent.

 If any of you have any experience in measuring Samza job throughput, can
 you please share. Really appreciate any ideas on measuring job throughput.

 Thanks
 Milinda
 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org



Re: Measuring Samza Job Throughput

2015-06-17 Thread Chris Riccomini
Hey Milinda,

Specifically, for bytes/sec, you might want to look at serde metrics. I
believe the serde manager tracks bytes serialized and deserialized per
second. The consumers and producers also do this for Kafka, but on a more
granular basis. If you want container-level throughput, serde manager is
worth looking at.

Cheers,
Chris

On Tuesday, June 16, 2015, Milinda Pathirage mpath...@umail.iu.edu wrote:

 Hi Devs,

 I was looking for a way to measure Samza job throughput and found that its
 possible to do it via Samza's metrics reporter. But there several types of
 metrics reported via this method. For example, TaskInstanceMetrics reports
 number of messages sent. But if I wanted to get a measurement like bytes
 per second produced, is there a way to do that. It looks
 like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
 of messages sent.

 If any of you have any experience in measuring Samza job throughput, can
 you please share. Really appreciate any ideas on measuring job throughput.

 Thanks
 Milinda
 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org



Re: Measuring Samza Job Throughput

2015-06-17 Thread Chris Riccomini
Hmm, correction. I think this has to be done at tbhe KafkaSystem level. We
allow consumers and producers to return non-byte messages, which means
nothing in container can safely assume that a message is a byte array
except the serde manager. I took a look there but didn't see any byte
throughout metrics after all.

On Tuesday, June 16, 2015, Chris Riccomini criccom...@apache.org wrote:

 Hey Milinda,

 Specifically, for bytes/sec, you might want to look at serde metrics. I
 believe the serde manager tracks bytes serialized and deserialized per
 second. The consumers and producers also do this for Kafka, but on a more
 granular basis. If you want container-level throughput, serde manager is
 worth looking at.

 Cheers,
 Chris

 On Tuesday, June 16, 2015, Milinda Pathirage mpath...@umail.iu.edu
 javascript:_e(%7B%7D,'cvml','mpath...@umail.iu.edu'); wrote:

 Hi Devs,

 I was looking for a way to measure Samza job throughput and found that its
 possible to do it via Samza's metrics reporter. But there several types of
 metrics reported via this method. For example, TaskInstanceMetrics reports
 number of messages sent. But if I wanted to get a measurement like bytes
 per second produced, is there a way to do that. It looks
 like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
 number
 of messages sent.

 If any of you have any experience in measuring Samza job throughput, can
 you please share. Really appreciate any ideas on measuring job throughput.

 Thanks
 Milinda
 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org




Measuring Samza Job Throughput

2015-06-16 Thread Milinda Pathirage
Hi Devs,

I was looking for a way to measure Samza job throughput and found that its
possible to do it via Samza's metrics reporter. But there several types of
metrics reported via this method. For example, TaskInstanceMetrics reports
number of messages sent. But if I wanted to get a measurement like bytes
per second produced, is there a way to do that. It looks
like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
of messages sent.

If any of you have any experience in measuring Samza job throughput, can
you please share. Really appreciate any ideas on measuring job throughput.

Thanks
Milinda
-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org