Re: How to monitor Spark Streaming from Kafka?

2015-06-02 Thread Ruslan Dautkhanov
Nobody mentioned CM yet? Kafka is now supported by CM/CDH 5.4 http://www.cloudera.com/content/cloudera/en/documentation/cloudera-kafka/latest/PDF/cloudera-kafka.pdf -- Ruslan Dautkhanov On Mon, Jun 1, 2015 at 5:19 PM, Dmitry Goldenberg dgoldenberg...@gmail.com wrote: Thank you, Tathagata,

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Cody Koeninger
KafkaCluster.scala in the spark/extrernal/kafka project has a bunch of api code, including code for updating Kafka-managed ZK offsets. Look at setConsumerOffsets. Unfortunately all of that code is private, but you can either write your own, copy it, or do what I do (sed out private[spark] and

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Tathagata Das
In the receiver-less direct approach, there is no concept of consumer group as we dont use the Kafka High Level consumer (that uses ZK). Instead Spark Streaming manages offsets on its own, giving tighter guarantees. If you want to monitor the progress of the processing of offsets, you will have to

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
I think you can use SPM - http://sematext.com/spm - it will give you all Spark and all Kafka metrics, including offsets broken down by topic, etc. out of the box. I see more and more people using it to monitor various components in data processing pipelines, a la

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Dmitry Goldenberg
Thank you, Tathagata, Cody, Otis. - Dmitry On Mon, Jun 1, 2015 at 6:57 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I think you can use SPM - http://sematext.com/spm - it will give you all Spark and all Kafka metrics, including offsets broken down by topic, etc. out of the box.