Hey Nikolay, I know the approach, but this pretty much doesnt fit the bill for my usecase wherein each topic needs to be logged / persisted as a separate hdfs location.
I am looking for something where a streaming context pertains to a topic and that topic only, and was wondering if I could have them all in parallel in one app / jar run. Thanks, On Mon, Aug 1, 2016 at 1:08 PM, Nikolay Zhebet <phpap...@gmail.com> wrote: > Hi, If you want read several kafka topics in spark-streaming job, you can > set names of topics splited by coma and after that you can read all > messages from all topics in one flow: > > val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap > > val lines = KafkaUtils.createStream[String, String, StringDecoder, > StringDecoder](ssc, kafkaParams, topicMap, StorageLevel.MEMORY_ONLY).map(_._2) > > > After that you can use ".filter" function for splitting your topics and > iterate messages separately. > > val orders_paid = lines.filter(x => { x("table_name") == > "kismia.orders_paid"}) > > orders_paid.foreachRDD( rdd => { .... > > > Or you can you you if..else construction for splitting your messages by > names in foreachRDD: > > lines.foreachRDD((recrdd, time: Time) => { > > recrdd.foreachPartition(part => { > > part.foreach(item_row => { > > if (item_row("table_name") == "kismia.orders_paid") { ...} else if > (...) {...} > > .... > > > 2016-08-01 9:39 GMT+03:00 Sumit Khanna <sumit.kha...@askme.in>: > >> Any ideas guys? What are the best practices for multiple streams to be >> processed? >> I could trace a few Stack overflow comments wherein they better recommend >> a jar separate for each stream / use case. But that isn't pretty much what >> I want, as in it's better if one / multiple spark streaming contexts can >> all be handled well within a single jar. >> >> Guys please reply, >> >> Awaiting, >> >> Thanks, >> Sumit >> >> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna <sumit.kha...@askme.in> >> wrote: >> >>> Any ideas on this one guys ? >>> >>> I can do a sample run but can't be sure of imminent problems if any? How >>> can I ensure different batchDuration etc etc in here, per StreamingContext. >>> >>> Thanks, >>> >>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna <sumit.kha...@askme.in> >>> wrote: >>> >>>> Hey, >>>> >>>> Was wondering if I could create multiple spark stream contexts in my >>>> application (e.g instantiating a worker actor per topic and it has its own >>>> streaming context its own batch duration everything). >>>> >>>> What are the caveats if any? >>>> What are the best practices? >>>> >>>> Have googled half heartedly on the same but the air isn't pretty much >>>> demystified yet. I could skim through something like >>>> >>>> >>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations >>>> >>>> >>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker >>>> >>>> Thanks in Advance! >>>> Sumit >>>> >>> >>> >> >