Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-21 Thread shyla deshpande
Thanks TD.

On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das  wrote:

> This setting allows multiple spark jobs generated through multiple
> foreachRDD to run concurrently, even if they are across batches. So output
> op2 from batch X, can run concurrently with op1 of batch X+1
> This is not safe because it breaks the checkpointing logic in subtle ways.
> Note that this was never documented in the spark online docs.
>
> On Tue, Mar 14, 2017 at 2:29 PM, shyla deshpande  > wrote:
>
>> Thanks TD for the response. Can you please provide more explanation. I am
>>  having multiple streams in the spark streaming application (Spark 2.0.2
>> using DStreams).  I know many people using this setting. So your
>> explanation will help a lot of people.
>>
>> Thanks
>>
>> On Fri, Mar 10, 2017 at 6:24 PM, Tathagata Das 
>> wrote:
>>
>>> That config I not safe. Please do not use it.
>>>
>>> On Mar 10, 2017 10:03 AM, "shyla deshpande" 
>>> wrote:
>>>
 I have a spark streaming application which processes 3 kafka streams
 and has 5 output operations.

 Not sure what should be the setting for spark.streaming.concurrentJobs.

 1. If the concurrentJobs setting is 4 does that mean 2 output
 operations will be run sequentially?

 2. If I had 6 cores what would be a ideal setting for concurrentJobs in
 this situation?

 I appreciate your input. Thanks

>>>
>>
>


Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread Tathagata Das
This setting allows multiple spark jobs generated through multiple
foreachRDD to run concurrently, even if they are across batches. So output
op2 from batch X, can run concurrently with op1 of batch X+1
This is not safe because it breaks the checkpointing logic in subtle ways.
Note that this was never documented in the spark online docs.

On Tue, Mar 14, 2017 at 2:29 PM, shyla deshpande 
wrote:

> Thanks TD for the response. Can you please provide more explanation. I am
>  having multiple streams in the spark streaming application (Spark 2.0.2
> using DStreams).  I know many people using this setting. So your
> explanation will help a lot of people.
>
> Thanks
>
> On Fri, Mar 10, 2017 at 6:24 PM, Tathagata Das 
> wrote:
>
>> That config I not safe. Please do not use it.
>>
>> On Mar 10, 2017 10:03 AM, "shyla deshpande" 
>> wrote:
>>
>>> I have a spark streaming application which processes 3 kafka streams and
>>> has 5 output operations.
>>>
>>> Not sure what should be the setting for spark.streaming.concurrentJobs.
>>>
>>> 1. If the concurrentJobs setting is 4 does that mean 2 output operations
>>> will be run sequentially?
>>>
>>> 2. If I had 6 cores what would be a ideal setting for concurrentJobs in
>>> this situation?
>>>
>>> I appreciate your input. Thanks
>>>
>>
>


Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread shyla deshpande
Thanks TD for the response. Can you please provide more explanation. I am
 having multiple streams in the spark streaming application (Spark 2.0.2
using DStreams).  I know many people using this setting. So your
explanation will help a lot of people.

Thanks

On Fri, Mar 10, 2017 at 6:24 PM, Tathagata Das  wrote:

> That config I not safe. Please do not use it.
>
> On Mar 10, 2017 10:03 AM, "shyla deshpande" 
> wrote:
>
>> I have a spark streaming application which processes 3 kafka streams and
>> has 5 output operations.
>>
>> Not sure what should be the setting for spark.streaming.concurrentJobs.
>>
>> 1. If the concurrentJobs setting is 4 does that mean 2 output operations
>> will be run sequentially?
>>
>> 2. If I had 6 cores what would be a ideal setting for concurrentJobs in
>> this situation?
>>
>> I appreciate your input. Thanks
>>
>


Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread Tathagata Das
That config I not safe. Please do not use it.

On Mar 10, 2017 10:03 AM, "shyla deshpande" 
wrote:

> I have a spark streaming application which processes 3 kafka streams and
> has 5 output operations.
>
> Not sure what should be the setting for spark.streaming.concurrentJobs.
>
> 1. If the concurrentJobs setting is 4 does that mean 2 output operations
> will be run sequentially?
>
> 2. If I had 6 cores what would be a ideal setting for concurrentJobs in
> this situation?
>
> I appreciate your input. Thanks
>


spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread shyla deshpande
I have a spark streaming application which processes 3 kafka streams and
has 5 output operations.

Not sure what should be the setting for spark.streaming.concurrentJobs.

1. If the concurrentJobs setting is 4 does that mean 2 output operations
will be run sequentially?

2. If I had 6 cores what would be a ideal setting for concurrentJobs in
this situation?

I appreciate your input. Thanks