Hi twinkle, To be completely honest, I'm not sure, I had never heard "spark.task.cpus" before. But I could imagine two different use cases:
a) instead of just relying on spark's creation of tasks for parallelism, a user wants to run multiple threads *within* a task. This is sort of going against the programming model of spark, but I guess this feature is meant to give you the bare minimum support you need in case you really want. Eg., maybe you have some existing library you want to use in each task which is already multi-threaded, or you pipe to some external programming. Or maybe you even do something custom yourself -- eg. you have some coordination between threads that spark doesn't give you between tasks. b) as a simple way to tune some resource management. Eg., you could initially have your cluster configured to overcount cores for hyperthreading, but then set "spark.task.cpus" to 2, if you don't want to count hyperthreading. Or perhaps you want to leave some cores open for all the other work going on -- GC, network IO, etc. (But then again, this is a strange setting to use for that -- you'd probably just want some fixed number of cores to count, not a multiplier.) On Tue, Apr 7, 2015 at 2:01 AM, twinkle sachdeva <twinkle.sachd...@gmail.com > wrote: > Hi, > > In spark, there are two settings regarding number of cores, one is at task > level :spark.task.cpus > > and there is another one, which drives number of cores per executors: > spark.executor.cores > > Apart from using more than one core for a task which has to call some > other external API etc, is there any other use case / benefit of assigning > more than one core to a task? > > As per the code, I can only see this being used while scheduling etc , as > such RDD partitions etc remains untouched from this setting. Does this mean > that coder needs to take care of coding the application logic to take care > of this setting? ( which again let me think over this setting ). > > Comments please. > > Thanks, > > Twinkle > > > > >