Thanks TD. On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das <t...@databricks.com> wrote:
> This setting allows multiple spark jobs generated through multiple > foreachRDD to run concurrently, even if they are across batches. So output > op2 from batch X, can run concurrently with op1 of batch X+1 > This is not safe because it breaks the checkpointing logic in subtle ways. > Note that this was never documented in the spark online docs. > > On Tue, Mar 14, 2017 at 2:29 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Thanks TD for the response. Can you please provide more explanation. I am >> having multiple streams in the spark streaming application (Spark 2.0.2 >> using DStreams). I know many people using this setting. So your >> explanation will help a lot of people. >> >> Thanks >> >> On Fri, Mar 10, 2017 at 6:24 PM, Tathagata Das <t...@databricks.com> >> wrote: >> >>> That config I not safe. Please do not use it. >>> >>> On Mar 10, 2017 10:03 AM, "shyla deshpande" <deshpandesh...@gmail.com> >>> wrote: >>> >>>> I have a spark streaming application which processes 3 kafka streams >>>> and has 5 output operations. >>>> >>>> Not sure what should be the setting for spark.streaming.concurrentJobs. >>>> >>>> 1. If the concurrentJobs setting is 4 does that mean 2 output >>>> operations will be run sequentially? >>>> >>>> 2. If I had 6 cores what would be a ideal setting for concurrentJobs in >>>> this situation? >>>> >>>> I appreciate your input. Thanks >>>> >>> >> >