Re: Value for SPARK_EXECUTOR_CORES

2015-05-28 Thread Mulugeta Mammo
Thanks for the valuable information. The blog states:

The cores property controls the number of concurrent tasks an executor can
run. --executor-cores 5 means that each executor can run a maximum of five
tasks at the same time. 

So, I guess the max number of executor-cores I can assign is the CPU count
(which includes the number of threads per core), not just the number of
cores. I just want to be sure the cores term Spark is using.

Thanks

On Thu, May 28, 2015 at 11:16 AM, Ruslan Dautkhanov dautkha...@gmail.com
wrote:

 It's not only about cores. Keep in mind spark.executor.cores also affects
 available memeory for each task:

 From
 http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

 The memory available to each task is (spark.executor.memory *
 spark.shuffle.memoryFraction *spark.shuffle.safetyFraction)/
 spark.executor.cores. Memory fraction and safety fraction default to 0.2
 and 0.8 respectively.

 I'd test spark.executor.cores with 2,4,8 and 16 and see what makes your
 job run faster..


 --
 Ruslan Dautkhanov

 On Wed, May 27, 2015 at 6:46 PM, Mulugeta Mammo mulugeta.abe...@gmail.com
  wrote:

 My executor has the following spec (lscpu):

 CPU(s): 16
 Core(s) per socket: 4
 Socket(s): 2
 Thread(s) per code: 2

 The CPU count is obviously 4*2*2 = 16. My question is what value is Spark
 expecting in SPARK_EXECUTOR_CORES ? The CPU count (16) or total # of cores
 (2 * 2 = 4) ?

 Thanks





RE: Value for SPARK_EXECUTOR_CORES

2015-05-28 Thread Evo Eftimov
I don’t think the number of CPU cores controls the “number of parallel tasks”. 
The number of Tasks corresponds first and foremost to the number of (Dstream) 
RDD Partitions  

 

The Spark documentation doesn’t mention what is meant by “Task” in terms of 
Standard Multithreading Terminology ie a Thread or Process so your point is 
good 

 

Ps: time and time again every product and dev team and company invent their own 
terminology so 50% of the time using the product is spent on deciphering and 
reinventing the wheel 

 

From: Mulugeta Mammo [mailto:mulugeta.abe...@gmail.com] 
Sent: Thursday, May 28, 2015 7:24 PM
To: Ruslan Dautkhanov
Cc: user
Subject: Re: Value for SPARK_EXECUTOR_CORES

 

Thanks for the valuable information. The blog states:

 

The cores property controls the number of concurrent tasks an executor can 
run. --executor-cores 5 means that each executor can run a maximum of five 
tasks at the same time.  

 

So, I guess the max number of executor-cores I can assign is the CPU count 
(which includes the number of threads per core), not just the number of cores. 
I just want to be sure the cores term Spark is using.

 

Thanks

 

On Thu, May 28, 2015 at 11:16 AM, Ruslan Dautkhanov dautkha...@gmail.com 
wrote:

It's not only about cores. Keep in mind spark.executor.cores also affects 
available memeory for each task:

 

From 
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

 

The memory available to each task is (spark.executor.memory * 
spark.shuffle.memoryFraction 
*spark.shuffle.safetyFraction)/spark.executor.cores. Memory fraction and safety 
fraction default to 0.2 and 0.8 respectively.




I'd test spark.executor.cores with 2,4,8 and 16 and see what makes your job run 
faster..

 


-- 
Ruslan Dautkhanov

 

On Wed, May 27, 2015 at 6:46 PM, Mulugeta Mammo mulugeta.abe...@gmail.com 
wrote:

My executor has the following spec (lscpu):

 

CPU(s): 16

Core(s) per socket: 4

Socket(s): 2

Thread(s) per code: 2

 

The CPU count is obviously 4*2*2 = 16. My question is what value is Spark 
expecting in SPARK_EXECUTOR_CORES ? The CPU count (16) or total # of cores (2 * 
2 = 4) ?

 

Thanks

 

 



Re: Value for SPARK_EXECUTOR_CORES

2015-05-28 Thread Ruslan Dautkhanov
It's not only about cores. Keep in mind spark.executor.cores also affects
available memeory for each task:

From
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

The memory available to each task is (spark.executor.memory *
spark.shuffle.memoryFraction *spark.shuffle.safetyFraction)/
spark.executor.cores. Memory fraction and safety fraction default to 0.2
and 0.8 respectively.

I'd test spark.executor.cores with 2,4,8 and 16 and see what makes your job
run faster..


-- 
Ruslan Dautkhanov

On Wed, May 27, 2015 at 6:46 PM, Mulugeta Mammo mulugeta.abe...@gmail.com
wrote:

 My executor has the following spec (lscpu):

 CPU(s): 16
 Core(s) per socket: 4
 Socket(s): 2
 Thread(s) per code: 2

 The CPU count is obviously 4*2*2 = 16. My question is what value is Spark
 expecting in SPARK_EXECUTOR_CORES ? The CPU count (16) or total # of cores
 (2 * 2 = 4) ?

 Thanks