Re: Did anyone tried overcommit of CPU cores?

2015-01-09 Thread Xuelin Cao
Thanks, but, how to increase the tasks per core?

For example, if the application claims 10 cores, is it possible to launch
100 tasks concurrently?



On Fri, Jan 9, 2015 at 2:57 PM, Jörn Franke jornfra...@gmail.com wrote:

 Hallo,

 Based on experiences with other software in virtualized environments I
 cannot really recommend this. However, I am not sure how Spark reacts. You
 may face unpredictable task failures depending on utilization, tasks
 connecting to external systems (databases etc.) may fail unexpectedly and
 this might be a problem for them (transactions not finishing etc.).

 Why not increase the tasks per core?

 Best regards
 Le 9 janv. 2015 06:46, Xuelin Cao xuelincao2...@gmail.com a écrit :


 Hi,

   I'm wondering whether it is a good idea to overcommit CPU cores on
 the spark cluster.

   For example, in our testing cluster, each worker machine has 24
 physical CPU cores. However, we are allowed to set the CPU core number to
 48 or more in the spark configuration file. As a result, we are allowed to
 launch more tasks than the number of physical CPU cores.

   The motivation of overcommit CPU cores is, for many times, a task
 cannot consume 100% resource of a single CPU core (due to I/O, shuffle,
 etc.).

   So, overcommit the CPU cores allows more tasks running at the same
 time, and makes the resource be used economically.

   But, is there any reason that we should not doing like this? Anyone
 tried this?

   [image: Inline image 1]





Re: Did anyone tried overcommit of CPU cores?

2015-01-09 Thread gen tang
Hi,

As you said, the --executor-cores will define the max number of tasks that
an executor can take simultaneously. So, if you claim 10 cores, it is not
possible to launch more than 10 tasks in an executor at the same time.
According to my experience, set cores more than physical CPU core will
cause overload of CPU at some point of execution of spark application.
especially when you are using algorithm in mllib package. In addition, the
executor-cores will affect the default level of parallelism of spark.
Therefore, I recommend you to set cores = physical cores by default.
Moreover, I don't think overcommit cpu will increase the use of CPU. In my
opinion, it just increase the waiting queue of CPU.
If you observe the CPU load is very low (through ganglia for example) and
too much IO, maybe increasing level of parallelism or serializing your
object is a good choice.

Hoping this helps

Cheers
Gen


On Fri, Jan 9, 2015 at 10:12 AM, Xuelin Cao xuelincao2...@gmail.com wrote:


 Thanks, but, how to increase the tasks per core?

 For example, if the application claims 10 cores, is it possible to launch
 100 tasks concurrently?



 On Fri, Jan 9, 2015 at 2:57 PM, Jörn Franke jornfra...@gmail.com wrote:

 Hallo,

 Based on experiences with other software in virtualized environments I
 cannot really recommend this. However, I am not sure how Spark reacts. You
 may face unpredictable task failures depending on utilization, tasks
 connecting to external systems (databases etc.) may fail unexpectedly and
 this might be a problem for them (transactions not finishing etc.).

 Why not increase the tasks per core?

 Best regards
 Le 9 janv. 2015 06:46, Xuelin Cao xuelincao2...@gmail.com a écrit :


 Hi,

   I'm wondering whether it is a good idea to overcommit CPU cores on
 the spark cluster.

   For example, in our testing cluster, each worker machine has 24
 physical CPU cores. However, we are allowed to set the CPU core number to
 48 or more in the spark configuration file. As a result, we are allowed to
 launch more tasks than the number of physical CPU cores.

   The motivation of overcommit CPU cores is, for many times, a task
 cannot consume 100% resource of a single CPU core (due to I/O, shuffle,
 etc.).

   So, overcommit the CPU cores allows more tasks running at the same
 time, and makes the resource be used economically.

   But, is there any reason that we should not doing like this?
 Anyone tried this?

   [image: Inline image 1]






Re: Did anyone tried overcommit of CPU cores?

2015-01-08 Thread Jörn Franke
Hallo,

Based on experiences with other software in virtualized environments I
cannot really recommend this. However, I am not sure how Spark reacts. You
may face unpredictable task failures depending on utilization, tasks
connecting to external systems (databases etc.) may fail unexpectedly and
this might be a problem for them (transactions not finishing etc.).

Why not increase the tasks per core?

Best regards
Le 9 janv. 2015 06:46, Xuelin Cao xuelincao2...@gmail.com a écrit :


 Hi,

   I'm wondering whether it is a good idea to overcommit CPU cores on
 the spark cluster.

   For example, in our testing cluster, each worker machine has 24
 physical CPU cores. However, we are allowed to set the CPU core number to
 48 or more in the spark configuration file. As a result, we are allowed to
 launch more tasks than the number of physical CPU cores.

   The motivation of overcommit CPU cores is, for many times, a task
 cannot consume 100% resource of a single CPU core (due to I/O, shuffle,
 etc.).

   So, overcommit the CPU cores allows more tasks running at the same
 time, and makes the resource be used economically.

   But, is there any reason that we should not doing like this? Anyone
 tried this?

   [image: Inline image 1]





Did anyone tried overcommit of CPU cores?

2015-01-08 Thread Xuelin Cao
Hi,

  I'm wondering whether it is a good idea to overcommit CPU cores on
the spark cluster.

  For example, in our testing cluster, each worker machine has 24
physical CPU cores. However, we are allowed to set the CPU core number to
48 or more in the spark configuration file. As a result, we are allowed to
launch more tasks than the number of physical CPU cores.

  The motivation of overcommit CPU cores is, for many times, a task
cannot consume 100% resource of a single CPU core (due to I/O, shuffle,
etc.).

  So, overcommit the CPU cores allows more tasks running at the same
time, and makes the resource be used economically.

  But, is there any reason that we should not doing like this? Anyone
tried this?

  [image: Inline image 1]