Re: Multiple Kafka topics processing in Spark 2.2

2017-09-12 Thread kant kodali
@Dan shouldn't you be using Dataset/Dataframes ? I heard it is recommended to use Dataset and Dataframes than using Dstreams since Dstreams is in maintenance mode. On Mon, Sep 11, 2017 at 7:41 AM, Cody Koeninger wrote: > If you want an "easy" but not particularly performant

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-11 Thread Cody Koeninger
If you want an "easy" but not particularly performant way to do it, each org.apache.kafka.clients.consumer.ConsumerRecord has a topic. The topic is going to be the same for the entire partition as long as you haven't shuffled, hence the examples on how to deal with it at a partition level. On

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-08 Thread Dan Dong
Hi,Alonso. Thanks! I've read about this but did not quite understand it. To pick out the topic name of a kafka message seems a simple task but the example code looks so complicated with redundent info. Why do we need offsetRanges here and do we have a easy way to achieve this? Cheers, Dan

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Alonso Isidoro Roman
Hi, reading the official doc , i think you can do it this way: import org.apache.spark.streaming.kafka._ val directKafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc,

Multiple Kafka topics processing in Spark 2.2

2017-09-06 Thread Dan Dong
Hi, All, I have one issue here about how to process multiple Kafka topics in a Spark 2.* program. My question is: How to get the topic name from a message received from Kafka? E.g: .. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc,