Hi Akhil,

Think of the scenario as running a piece of code in normal Java with
multiple threads. Lets say there are 4 threads spawned by a Java process to
handle reading from database, some processing and storing to database. In
this process, while a thread is performing a database I/O, the CPU could
allow another thread to perform the processing, thus efficiently using the
resources.

Incase of Spark, while a node executor is running the same "read from DB =>
process data => store to DB", during the "read from DB" and "store to DB"
phase, the CPU is not given to other requests in queue, since the executor
will allocate the resources completely to the current ongoing request.

Does not flag spark.streaming.concurrentJobs enable this kind of scenario
or is there any other way to achieve what I am looking for

Thanks,
Sateesh

On Sat, Aug 22, 2015 at 7:26 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Hmm for a singl core VM you will have to run it in local mode(specifying
> master= local[4]). The flag is available in all the versions of spark i
> guess.
> On Aug 22, 2015 5:04 AM, "Sateesh Kavuri" <sateesh.kav...@gmail.com>
> wrote:
>
>> Thanks Akhil. Does this mean that the executor running in the VM can
>> spawn two concurrent jobs on the same core? If this is the case, this is
>> what we are looking for. Also, which version of Spark is this flag in?
>>
>> Thanks,
>> Sateesh
>>
>> On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> You can look at the spark.streaming.concurrentJobs by default it runs a
>>> single job. If set it to 2 then it can run 2 jobs parallely. Its an
>>> experimental flag, but go ahead and give it a try.
>>> On Aug 21, 2015 3:36 AM, "Sateesh Kavuri" <sateesh.kav...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> My scenario goes like this:
>>>> I have an algorithm running in Spark streaming mode on a 4 core virtual
>>>> machine. Majority of the time, the algorithm does disk I/O and database
>>>> I/O. Question is, during the I/O, where the CPU is not considerably loaded,
>>>> is it possible to run any other task/thread so as to efficiently utilize
>>>> the CPU?
>>>>
>>>> Note that one DStream of the algorithm runs completely on a single CPU
>>>>
>>>> Thank you,
>>>> Sateesh
>>>>
>>>
>>

Reply via email to