Re: Spark streaming multi-tasking during I/O

2015-08-23 Thread Akhil Das
If you set concurrentJobs flag to 2, then it lets you run two jobs
parallel. It will be a bit hard for you predict the application behavior
with this flag thus debugging would be a headache.

Thanks
Best Regards

On Sun, Aug 23, 2015 at 10:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
wrote:

 Hi Akhil,

 Think of the scenario as running a piece of code in normal Java with
 multiple threads. Lets say there are 4 threads spawned by a Java process to
 handle reading from database, some processing and storing to database. In
 this process, while a thread is performing a database I/O, the CPU could
 allow another thread to perform the processing, thus efficiently using the
 resources.

 Incase of Spark, while a node executor is running the same read from DB
 = process data = store to DB, during the read from DB and store to
 DB phase, the CPU is not given to other requests in queue, since the
 executor will allocate the resources completely to the current ongoing
 request.

 Does not flag spark.streaming.concurrentJobs enable this kind of scenario
 or is there any other way to achieve what I am looking for

 Thanks,
 Sateesh

 On Sat, Aug 22, 2015 at 7:26 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Hmm for a singl core VM you will have to run it in local mode(specifying
 master= local[4]). The flag is available in all the versions of spark i
 guess.
 On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Thanks Akhil. Does this mean that the executor running in the VM can
 spawn two concurrent jobs on the same core? If this is the case, this is
 what we are looking for. Also, which version of Spark is this flag in?

 Thanks,
 Sateesh

 On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You can look at the spark.streaming.concurrentJobs by default it runs a
 single job. If set it to 2 then it can run 2 jobs parallely. Its an
 experimental flag, but go ahead and give it a try.
 On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core
 virtual machine. Majority of the time, the algorithm does disk I/O and
 database I/O. Question is, during the I/O, where the CPU is not
 considerably loaded, is it possible to run any other task/thread so as to
 efficiently utilize the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh






Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Akhil Das
Hmm for a singl core VM you will have to run it in local mode(specifying
master= local[4]). The flag is available in all the versions of spark i
guess.
On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote:

 Thanks Akhil. Does this mean that the executor running in the VM can spawn
 two concurrent jobs on the same core? If this is the case, this is what we
 are looking for. Also, which version of Spark is this flag in?

 Thanks,
 Sateesh

 On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You can look at the spark.streaming.concurrentJobs by default it runs a
 single job. If set it to 2 then it can run 2 jobs parallely. Its an
 experimental flag, but go ahead and give it a try.
 On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh





Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Rishitesh,

We are not using any RDD's to parallelize the processing and all of the
algorithm runs on a single core (and in a single thread). The parallelism
is done at the user level

The disk can be started in a separate IO, but then the executor will not be
able to take up more jobs, since thats how I believe Spark is designed by
default

On Sat, Aug 22, 2015 at 12:51 AM, Rishitesh Mishra rishi80.mis...@gmail.com
 wrote:

 Hi Sateesh,
 It is interesting to know , how did you determine that the Dstream runs on
 a single core. Did you mean receivers?

 Coming back to your question, could you not start disk io in a separate
 thread, so that the sceduler can go ahead and assign other tasks ?
 On 21 Aug 2015 16:06, Sateesh Kavuri sateesh.kav...@gmail.com wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh




Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Akhil,

Think of the scenario as running a piece of code in normal Java with
multiple threads. Lets say there are 4 threads spawned by a Java process to
handle reading from database, some processing and storing to database. In
this process, while a thread is performing a database I/O, the CPU could
allow another thread to perform the processing, thus efficiently using the
resources.

Incase of Spark, while a node executor is running the same read from DB =
process data = store to DB, during the read from DB and store to DB
phase, the CPU is not given to other requests in queue, since the executor
will allocate the resources completely to the current ongoing request.

Does not flag spark.streaming.concurrentJobs enable this kind of scenario
or is there any other way to achieve what I am looking for

Thanks,
Sateesh

On Sat, Aug 22, 2015 at 7:26 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Hmm for a singl core VM you will have to run it in local mode(specifying
 master= local[4]). The flag is available in all the versions of spark i
 guess.
 On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Thanks Akhil. Does this mean that the executor running in the VM can
 spawn two concurrent jobs on the same core? If this is the case, this is
 what we are looking for. Also, which version of Spark is this flag in?

 Thanks,
 Sateesh

 On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 You can look at the spark.streaming.concurrentJobs by default it runs a
 single job. If set it to 2 then it can run 2 jobs parallely. Its an
 experimental flag, but go ahead and give it a try.
 On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh





Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Thanks Akhil. Does this mean that the executor running in the VM can spawn
two concurrent jobs on the same core? If this is the case, this is what we
are looking for. Also, which version of Spark is this flag in?

Thanks,
Sateesh

On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You can look at the spark.streaming.concurrentJobs by default it runs a
 single job. If set it to 2 then it can run 2 jobs parallely. Its an
 experimental flag, but go ahead and give it a try.
 On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com
 wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh




Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Rishitesh Mishra
Hi Sateesh,
It is interesting to know , how did you determine that the Dstream runs on
a single core. Did you mean receivers?

Coming back to your question, could you not start disk io in a separate
thread, so that the sceduler can go ahead and assign other tasks ?
On 21 Aug 2015 16:06, Sateesh Kavuri sateesh.kav...@gmail.com wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh



Re: Spark streaming multi-tasking during I/O

2015-08-21 Thread Akhil Das
You can look at the spark.streaming.concurrentJobs by default it runs a
single job. If set it to 2 then it can run 2 jobs parallely. Its an
experimental flag, but go ahead and give it a try.
On Aug 21, 2015 3:36 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote:

 Hi,

 My scenario goes like this:
 I have an algorithm running in Spark streaming mode on a 4 core virtual
 machine. Majority of the time, the algorithm does disk I/O and database
 I/O. Question is, during the I/O, where the CPU is not considerably loaded,
 is it possible to run any other task/thread so as to efficiently utilize
 the CPU?

 Note that one DStream of the algorithm runs completely on a single CPU

 Thank you,
 Sateesh