Re: Spark streaming multi-tasking during I/O

2015-08-23 Thread Akhil Das
If you set concurrentJobs flag to 2, then it lets you run two jobs parallel. It will be a bit hard for you predict the application behavior with this flag thus debugging would be a headache. Thanks Best Regards On Sun, Aug 23, 2015 at 10:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Akhil Das
Hmm for a singl core VM you will have to run it in local mode(specifying master= local[4]). The flag is available in all the versions of spark i guess. On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Thanks Akhil. Does this mean that the executor running in the VM can

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Rishitesh, We are not using any RDD's to parallelize the processing and all of the algorithm runs on a single core (and in a single thread). The parallelism is done at the user level The disk can be started in a separate IO, but then the executor will not be able to take up more jobs, since

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Akhil, Think of the scenario as running a piece of code in normal Java with multiple threads. Lets say there are 4 threads spawned by a Java process to handle reading from database, some processing and storing to database. In this process, while a thread is performing a database I/O, the CPU

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Thanks Akhil. Does this mean that the executor running in the VM can spawn two concurrent jobs on the same core? If this is the case, this is what we are looking for. Also, which version of Spark is this flag in? Thanks, Sateesh On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das

Spark streaming multi-tasking during I/O

2015-08-21 Thread Sateesh Kavuri
Hi, My scenario goes like this: I have an algorithm running in Spark streaming mode on a 4 core virtual machine. Majority of the time, the algorithm does disk I/O and database I/O. Question is, during the I/O, where the CPU is not considerably loaded, is it possible to run any other task/thread

Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Rishitesh Mishra
Hi Sateesh, It is interesting to know , how did you determine that the Dstream runs on a single core. Did you mean receivers? Coming back to your question, could you not start disk io in a separate thread, so that the sceduler can go ahead and assign other tasks ? On 21 Aug 2015 16:06, Sateesh

Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Akhil Das
You can look at the spark.streaming.concurrentJobs by default it runs a single job. If set it to 2 then it can run 2 jobs parallely. Its an experimental flag, but go ahead and give it a try. On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, My scenario goes like this: