Re: multiple spark streaming contexts

2016-08-01 Thread Nikolay Zhebet
You always can save data in hdfs where you need, and you can controll
paralelizm in your app by configuring --driver-cores and --driver-memory.This
approach can maintain Spark master and it can controll your failure issues,
data locality and etc. But if you want to controll it by self with
"Executors.newFixedThreadPool(threadNum)" or other ways, i think you can
catch problems with yarn/mesos job recovery and failure mechanizm.
I wish you good luck in your struggle of parallelism )) This is an
interesting question!)

2016-08-01 10:41 GMT+03:00 Sumit Khanna :

> Hey Nikolay,
>
> I know the approach, but this pretty much doesnt fit the bill for my
> usecase wherein each topic needs to be logged / persisted as a separate
> hdfs location.
>
> I am looking for something where a streaming context pertains to a topic
> and that topic only, and was wondering if I could have them all in parallel
> in one app / jar run.
>
> Thanks,
>
> On Mon, Aug 1, 2016 at 1:08 PM, Nikolay Zhebet  wrote:
>
>> Hi, If you want read several kafka topics in spark-streaming job, you can
>> set names of topics splited by coma and after that you can read all
>> messages from all topics in one flow:
>>
>> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>>
>> val lines = KafkaUtils.createStream[String, String, StringDecoder, 
>> StringDecoder](ssc, kafkaParams, topicMap, 
>> StorageLevel.MEMORY_ONLY).map(_._2)
>>
>>
>> After that you can use ".filter" function for splitting your topics and 
>> iterate messages separately.
>>
>> val orders_paid = lines.filter(x => { x("table_name") == 
>> "kismia.orders_paid"})
>>
>> orders_paid.foreachRDD( rdd => { 
>>
>>
>> Or you can you you if..else construction for splitting your messages by
>> names in foreachRDD:
>>
>> lines.foreachRDD((recrdd, time: Time) => {
>>
>>recrdd.foreachPartition(part => {
>>
>>   part.foreach(item_row => {
>>
>>  if (item_row("table_name") == "kismia.orders_paid") { ...} else if 
>> (...) {...}
>>
>> 
>>
>>
>> 2016-08-01 9:39 GMT+03:00 Sumit Khanna :
>>
>>> Any ideas guys? What are the best practices for multiple streams to be
>>> processed?
>>> I could trace a few Stack overflow comments wherein they better
>>> recommend a jar separate for each stream / use case. But that isn't pretty
>>> much what I want, as in it's better if one / multiple spark streaming
>>> contexts can all be handled well within a single jar.
>>>
>>> Guys please reply,
>>>
>>> Awaiting,
>>>
>>> Thanks,
>>> Sumit
>>>
>>> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna 
>>> wrote:
>>>
>>>> Any ideas on this one guys ?
>>>>
>>>> I can do a sample run but can't be sure of imminent problems if any?
>>>> How can I ensure different batchDuration etc etc in here, per
>>>> StreamingContext.
>>>>
>>>> Thanks,
>>>>
>>>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna 
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> Was wondering if I could create multiple spark stream contexts in my
>>>>> application (e.g instantiating a worker actor per topic and it has its own
>>>>> streaming context its own batch duration everything).
>>>>>
>>>>> What are the caveats if any?
>>>>> What are the best practices?
>>>>>
>>>>> Have googled half heartedly on the same but the air isn't pretty much
>>>>> demystified yet. I could skim through something like
>>>>>
>>>>>
>>>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>>>>
>>>>>
>>>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>>>>
>>>>> Thanks in Advance!
>>>>> Sumit
>>>>>
>>>>
>>>>
>>>
>>
>


Re: multiple spark streaming contexts

2016-08-01 Thread Sumit Khanna
Hey Nikolay,

I know the approach, but this pretty much doesnt fit the bill for my
usecase wherein each topic needs to be logged / persisted as a separate
hdfs location.

I am looking for something where a streaming context pertains to a topic
and that topic only, and was wondering if I could have them all in parallel
in one app / jar run.

Thanks,

On Mon, Aug 1, 2016 at 1:08 PM, Nikolay Zhebet  wrote:

> Hi, If you want read several kafka topics in spark-streaming job, you can
> set names of topics splited by coma and after that you can read all
> messages from all topics in one flow:
>
> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>
> val lines = KafkaUtils.createStream[String, String, StringDecoder, 
> StringDecoder](ssc, kafkaParams, topicMap, StorageLevel.MEMORY_ONLY).map(_._2)
>
>
> After that you can use ".filter" function for splitting your topics and 
> iterate messages separately.
>
> val orders_paid = lines.filter(x => { x("table_name") == 
> "kismia.orders_paid"})
>
> orders_paid.foreachRDD( rdd => { 
>
>
> Or you can you you if..else construction for splitting your messages by
> names in foreachRDD:
>
> lines.foreachRDD((recrdd, time: Time) => {
>
>recrdd.foreachPartition(part => {
>
>   part.foreach(item_row => {
>
>  if (item_row("table_name") == "kismia.orders_paid") { ...} else if 
> (...) {...}
>
> 
>
>
> 2016-08-01 9:39 GMT+03:00 Sumit Khanna :
>
>> Any ideas guys? What are the best practices for multiple streams to be
>> processed?
>> I could trace a few Stack overflow comments wherein they better recommend
>> a jar separate for each stream / use case. But that isn't pretty much what
>> I want, as in it's better if one / multiple spark streaming contexts can
>> all be handled well within a single jar.
>>
>> Guys please reply,
>>
>> Awaiting,
>>
>> Thanks,
>> Sumit
>>
>> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna 
>> wrote:
>>
>>> Any ideas on this one guys ?
>>>
>>> I can do a sample run but can't be sure of imminent problems if any? How
>>> can I ensure different batchDuration etc etc in here, per StreamingContext.
>>>
>>> Thanks,
>>>
>>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna 
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> Was wondering if I could create multiple spark stream contexts in my
>>>> application (e.g instantiating a worker actor per topic and it has its own
>>>> streaming context its own batch duration everything).
>>>>
>>>> What are the caveats if any?
>>>> What are the best practices?
>>>>
>>>> Have googled half heartedly on the same but the air isn't pretty much
>>>> demystified yet. I could skim through something like
>>>>
>>>>
>>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>>>
>>>>
>>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>>>
>>>> Thanks in Advance!
>>>> Sumit
>>>>
>>>
>>>
>>
>


Re: multiple spark streaming contexts

2016-08-01 Thread Nikolay Zhebet
Hi, If you want read several kafka topics in spark-streaming job, you can
set names of topics splited by coma and after that you can read all
messages from all topics in one flow:

val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap

val lines = KafkaUtils.createStream[String, String, StringDecoder,
StringDecoder](ssc, kafkaParams, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)


After that you can use ".filter" function for splitting your topics
and iterate messages separately.

val orders_paid = lines.filter(x => { x("table_name") == "kismia.orders_paid"})

orders_paid.foreachRDD( rdd => { 


Or you can you you if..else construction for splitting your messages by
names in foreachRDD:

lines.foreachRDD((recrdd, time: Time) => {

   recrdd.foreachPartition(part => {

  part.foreach(item_row => {

 if (item_row("table_name") == "kismia.orders_paid") { ...}
else if (...) {...}




2016-08-01 9:39 GMT+03:00 Sumit Khanna :

> Any ideas guys? What are the best practices for multiple streams to be
> processed?
> I could trace a few Stack overflow comments wherein they better recommend
> a jar separate for each stream / use case. But that isn't pretty much what
> I want, as in it's better if one / multiple spark streaming contexts can
> all be handled well within a single jar.
>
> Guys please reply,
>
> Awaiting,
>
> Thanks,
> Sumit
>
> On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna 
> wrote:
>
>> Any ideas on this one guys ?
>>
>> I can do a sample run but can't be sure of imminent problems if any? How
>> can I ensure different batchDuration etc etc in here, per StreamingContext.
>>
>> Thanks,
>>
>> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna 
>> wrote:
>>
>>> Hey,
>>>
>>> Was wondering if I could create multiple spark stream contexts in my
>>> application (e.g instantiating a worker actor per topic and it has its own
>>> streaming context its own batch duration everything).
>>>
>>> What are the caveats if any?
>>> What are the best practices?
>>>
>>> Have googled half heartedly on the same but the air isn't pretty much
>>> demystified yet. I could skim through something like
>>>
>>>
>>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>>
>>>
>>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>>
>>> Thanks in Advance!
>>> Sumit
>>>
>>
>>
>


Re: multiple spark streaming contexts

2016-07-31 Thread Sumit Khanna
Any ideas guys? What are the best practices for multiple streams to be
processed?
I could trace a few Stack overflow comments wherein they better recommend a
jar separate for each stream / use case. But that isn't pretty much what I
want, as in it's better if one / multiple spark streaming contexts can all
be handled well within a single jar.

Guys please reply,

Awaiting,

Thanks,
Sumit

On Mon, Aug 1, 2016 at 12:24 AM, Sumit Khanna  wrote:

> Any ideas on this one guys ?
>
> I can do a sample run but can't be sure of imminent problems if any? How
> can I ensure different batchDuration etc etc in here, per StreamingContext.
>
> Thanks,
>
> On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna 
> wrote:
>
>> Hey,
>>
>> Was wondering if I could create multiple spark stream contexts in my
>> application (e.g instantiating a worker actor per topic and it has its own
>> streaming context its own batch duration everything).
>>
>> What are the caveats if any?
>> What are the best practices?
>>
>> Have googled half heartedly on the same but the air isn't pretty much
>> demystified yet. I could skim through something like
>>
>>
>> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>>
>>
>> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>>
>> Thanks in Advance!
>> Sumit
>>
>
>


Re: multiple spark streaming contexts

2016-07-31 Thread Sumit Khanna
Any ideas on this one guys ?

I can do a sample run but can't be sure of imminent problems if any? How
can I ensure different batchDuration etc etc in here, per StreamingContext.

Thanks,

On Sun, Jul 31, 2016 at 10:50 AM, Sumit Khanna 
wrote:

> Hey,
>
> Was wondering if I could create multiple spark stream contexts in my
> application (e.g instantiating a worker actor per topic and it has its own
> streaming context its own batch duration everything).
>
> What are the caveats if any?
> What are the best practices?
>
> Have googled half heartedly on the same but the air isn't pretty much
> demystified yet. I could skim through something like
>
>
> http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations
>
>
> http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker
>
> Thanks in Advance!
> Sumit
>


multiple spark streaming contexts

2016-07-30 Thread Sumit Khanna
Hey,

Was wondering if I could create multiple spark stream contexts in my
application (e.g instantiating a worker actor per topic and it has its own
streaming context its own batch duration everything).

What are the caveats if any?
What are the best practices?

Have googled half heartedly on the same but the air isn't pretty much
demystified yet. I could skim through something like

http://stackoverflow.com/questions/29612726/how-do-you-setup-multiple-spark-streaming-jobs-with-different-batch-durations

http://stackoverflow.com/questions/37006565/multiple-spark-streaming-contexts-on-one-worker

Thanks in Advance!
Sumit