Re: Regarding benefits of using more than one cpu for a task in spark

Imran Rashid Tue, 14 Apr 2015 11:02:58 -0700

Hi twinkle,

To be completely honest, I'm not sure, I had never heard "spark.task.cpus"
before.  But I could imagine two different use cases:

a) instead of just relying on spark's creation of tasks for parallelism, a
user wants to run multiple threads *within* a task.  This is sort of going
against the programming model of spark, but I guess this feature is meant
to give you the bare minimum support you need in case you really want.
Eg., maybe you have some existing library you want to use in each task
which is already multi-threaded, or you pipe to some external programming.
Or maybe you even do something custom yourself -- eg. you have some
coordination between threads that spark doesn't give you between tasks.

b) as a simple way to tune some resource management.  Eg., you could
initially have your cluster configured to overcount cores for
hyperthreading, but then set "spark.task.cpus" to 2, if you don't want to
count hyperthreading.  Or perhaps you want to leave some cores open for all
the other work going on -- GC, network IO, etc.  (But then again, this is a
strange setting to use for that -- you'd probably just want some fixed
number of cores to count, not a multiplier.)

On Tue, Apr 7, 2015 at 2:01 AM, twinkle sachdeva <twinkle.sachd...@gmail.com
> wrote:

> Hi,
>
> In spark, there are two settings regarding number of cores, one is at task
> level :spark.task.cpus
>
> and there is another one, which drives number of cores per executors:
> spark.executor.cores
>
> Apart from using more than one core for a task which has to call some
> other external API etc, is there any other use case / benefit of assigning
> more than one core to a task?
>
> As per the code, I can only see this being used while scheduling etc , as
> such RDD partitions etc remains untouched from this setting. Does this mean
> that coder needs to take care of coding the application logic to take care
> of this setting? ( which again let me think over this setting ).
>
> Comments please.
>
> Thanks,
>
> Twinkle
>
>
>
>
>

Re: Regarding benefits of using more than one cpu for a task in spark

Reply via email to