Hi,

As you said, the --executor-cores will define the max number of tasks that
an executor can take simultaneously. So, if you claim 10 cores, it is not
possible to launch more than 10 tasks in an executor at the same time.
According to my experience, set cores more than physical CPU core will
cause overload of CPU at some point of execution of spark application.
especially when you are using algorithm in mllib package. In addition, the
executor-cores will affect the default level of parallelism of spark.
Therefore, I recommend you to set cores = physical cores by default.
Moreover, I don't think overcommit cpu will increase the use of CPU. In my
opinion, it just increase the waiting queue of CPU.
If you observe the CPU load is very low (through ganglia for example) and
too much IO, maybe increasing level of parallelism or serializing your
object is a good choice.

Hoping this helps

Cheers
Gen


On Fri, Jan 9, 2015 at 10:12 AM, Xuelin Cao <xuelincao2...@gmail.com> wrote:

>
> Thanks, but, how to increase the tasks per core?
>
> For example, if the application claims 10 cores, is it possible to launch
> 100 tasks concurrently?
>
>
>
> On Fri, Jan 9, 2015 at 2:57 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Hallo,
>>
>> Based on experiences with other software in virtualized environments I
>> cannot really recommend this. However, I am not sure how Spark reacts. You
>> may face unpredictable task failures depending on utilization, tasks
>> connecting to external systems (databases etc.) may fail unexpectedly and
>> this might be a problem for them (transactions not finishing etc.).
>>
>> Why not increase the tasks per core?
>>
>> Best regards
>> Le 9 janv. 2015 06:46, "Xuelin Cao" <xuelincao2...@gmail.com> a écrit :
>>
>>
>>> Hi,
>>>
>>>       I'm wondering whether it is a good idea to overcommit CPU cores on
>>> the spark cluster.
>>>
>>>       For example, in our testing cluster, each worker machine has 24
>>> physical CPU cores. However, we are allowed to set the CPU core number to
>>> 48 or more in the spark configuration file. As a result, we are allowed to
>>> launch more tasks than the number of physical CPU cores.
>>>
>>>       The motivation of overcommit CPU cores is, for many times, a task
>>> cannot consume 100% resource of a single CPU core (due to I/O, shuffle,
>>> etc.).
>>>
>>>       So, overcommit the CPU cores allows more tasks running at the same
>>> time, and makes the resource be used economically.
>>>
>>>       But, is there any reason that we should not doing like this?
>>> Anyone tried this?
>>>
>>>       [image: Inline image 1]
>>>
>>>
>>>
>

Reply via email to