Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread yncxcw
hi, all

I also noticed this problem. The reason is that Yarn accounts each executor
for only 1, no matter how many cores you configured. 
Because Yarn only uses memory as the primary metrics for resource
allocation. It means that Yarn will pack as many as executors on each node
as long as the node has 
free memory space.

If you want to enable vcores to be accounted for resource allocation, you
can configure the resource calculator as DominantResoruceCalculator, as
following:

PropertyDescription
yarn.scheduler.capacity.resource-calculator The ResourceCalculator
implementation to be used to compare Resources in the scheduler. The default
i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only
uses Memory while DominantResourceCalculator uses Dominant-resource to
compare multi-dimensional resources such as Memory, CPU etc. A Java
ResourceCalculator class name is expected.


Please also refer this article:
https://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/


Thanks!

Wei Chen



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Patrick Alwell
+1

AFAIK,

vCores are not the same as Cores in AWS. 
https://samrueby.com/2015/01/12/what-are-amazon-aws-vcpus/

I’ve always understood it as cores = num concurrent threads

These posts might help you with your research and why exceeding 5 cores per 
executor doesn’t make sense.

https://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors
http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

AWS/ EMR was always a challenge for me. Never understood why it didn’t seem to 
be using all my resources; as you noted.

I would see this as –num-executors = 15 –executor-cores= 5 –executor-memory = 
10gb and then test my application from there.

I only got better performance out of a different class of nodes, e.g. R-series 
instance types. Costs more than the M class; but wound up using less of them 
and my jobs ran faster. I was in the 10+TB jobs territory with TPC data.  ☺ The 
links I provided have a few use cases and trials.

Hope that helps,

-Pat


From: Selvam Raman <sel...@gmail.com>
Date: Monday, February 26, 2018 at 1:52 PM
To: Vadim Semenov <va...@datadoghq.com>
Cc: user <user@spark.apache.org>
Subject: Re: Spark EMR executor-core vs Vcores

Thanks. That’s make sense.

I want to know one more think , available vcore per machine is 16 but threads 
per node 8. Am I missing to relate here.

What I m thinking now is number of vote = number of threads.



On Mon, 26 Feb 2018 at 18:45, Vadim Semenov 
<va...@datadoghq.com<mailto:va...@datadoghq.com>> wrote:
All used cores aren't getting reported correctly in EMR, and YARN itself has no 
control over it, so whatever you put in `spark.executor.cores` will be used,
but in the ResourceManager you will only see 1 vcore used per nodemanager.

On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman 
<sel...@gmail.com<mailto:sel...@gmail.com>> wrote:
Hi,

spark version - 2.0.0
spark distribution - EMR 5.0.0

Spark Cluster - one master, 5 slaves
Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage



Cluster Metrics
Apps Submitted

Apps Pending

Apps Running

Apps Completed

Containers Running

Memory Used

Memory Total

Memory Reserved

VCores Used

VCores Total

VCores Reserved

Active Nodes

Decommissioning Nodes

Decommissioned Nodes

Lost Nodes

Unhealthy Nodes

Rebooted Nodes

16

0

1

15

5

88.88 GB

90.50 GB

22 GB

5

79

1

5<http://localhost:8088/cluster/nodes>

0<http://localhost:8088/cluster/nodes/decommissioning>

0<http://localhost:8088/cluster/nodes/decommissioned>

5<http://localhost:8088/cluster/nodes/lost>

0<http://localhost:8088/cluster/nodes/unhealthy>

0<http://localhost:8088/cluster/nodes/rebooted>


I have submitted job with below configuration
--num-executors 5 --executor-cores 10 --executor-memory 20g







spark.task.cpus - be default 1


My understanding is there will be 5 executore each can run 10 task at a time 
and task can share total memory of 20g. Here, i could see only 5 vcores used 
which means 1 executor instance use 20g+10%overhead ram(22gb), 10 core(number 
of threads), 1 Vcore(cpu).

please correct me if my understand is wrong.


how can i utilize number of vcore in EMR effectively. Will Vcore boost 
performance?



--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Vadim Semenov
yeah, for some reason (unknown to me, but you can find on aws forums) they
double the actual number of cores for nodemanagers.

I assume that's done to maximize utilization, but doesn't really matter to
me, at least, since I only run Spark, so I, personally, set `total number
of cores - 1/2` saving one core for the OS/DataNode/NodeManager, because
Spark itself can create a significant load.

On Mon, Feb 26, 2018 at 4:51 PM, Selvam Raman  wrote:

> Thanks. That’s make sense.
>
> I want to know one more think , available vcore per machine is 16 but
> threads per node 8. Am I missing to relate here.
>
> What I m thinking now is number of vote = number of threads.
>
>
>
> On Mon, 26 Feb 2018 at 18:45, Vadim Semenov  wrote:
>
>> All used cores aren't getting reported correctly in EMR, and YARN itself
>> has no control over it, so whatever you put in `spark.executor.cores` will
>> be used,
>> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>>
>> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman  wrote:
>>
>>> Hi,
>>>
>>> spark version - 2.0.0
>>> spark distribution - EMR 5.0.0
>>>
>>> Spark Cluster - one master, 5 slaves
>>>
>>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>>
>>>
>>> Cluster Metrics
>>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
>>> NodesRebooted
>>> Nodes
>>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>>  0
>>>  0
>>>  5
>>>  0
>>>  0
>>> 
>>> I have submitted job with below configuration
>>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>>
>>>
>>>
>>> spark.task.cpus - be default 1
>>>
>>>
>>> My understanding is there will be 5 executore each can run 10 task at a
>>> time and task can share total memory of 20g. Here, i could see only 5
>>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>>> 10 core(number of threads), 1 Vcore(cpu).
>>>
>>> please correct me if my understand is wrong.
>>>
>>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>>> performance?
>>>
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>
>> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Selvam Raman
Thanks. That’s make sense.

I want to know one more think , available vcore per machine is 16 but
threads per node 8. Am I missing to relate here.

What I m thinking now is number of vote = number of threads.



On Mon, 26 Feb 2018 at 18:45, Vadim Semenov  wrote:

> All used cores aren't getting reported correctly in EMR, and YARN itself
> has no control over it, so whatever you put in `spark.executor.cores` will
> be used,
> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>
> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman  wrote:
>
>> Hi,
>>
>> spark version - 2.0.0
>> spark distribution - EMR 5.0.0
>>
>> Spark Cluster - one master, 5 slaves
>>
>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>
>>
>> Cluster Metrics
>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
>> NodesRebooted
>> Nodes
>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>  0
>>  0
>>  5
>>  0
>>  0
>> 
>> I have submitted job with below configuration
>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>
>>
>>
>> spark.task.cpus - be default 1
>>
>>
>> My understanding is there will be 5 executore each can run 10 task at a
>> time and task can share total memory of 20g. Here, i could see only 5
>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>> 10 core(number of threads), 1 Vcore(cpu).
>>
>> please correct me if my understand is wrong.
>>
>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>> performance?
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
> --
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread akshay naidu
Putting all cores won't solve the purpose alone, you'll have to mention
executors as well executor memory accordingly to it..

On Tue 27 Feb, 2018, 12:15 AM Vadim Semenov,  wrote:

> All used cores aren't getting reported correctly in EMR, and YARN itself
> has no control over it, so whatever you put in `spark.executor.cores` will
> be used,
> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>
> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman  wrote:
>
>> Hi,
>>
>> spark version - 2.0.0
>> spark distribution - EMR 5.0.0
>>
>> Spark Cluster - one master, 5 slaves
>>
>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>
>>
>> Cluster Metrics
>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
>> NodesRebooted
>> Nodes
>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>  0
>>  0
>>  5
>>  0
>>  0
>> 
>> I have submitted job with below configuration
>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>
>>
>>
>> spark.task.cpus - be default 1
>>
>>
>> My understanding is there will be 5 executore each can run 10 task at a
>> time and task can share total memory of 20g. Here, i could see only 5
>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>> 10 core(number of threads), 1 Vcore(cpu).
>>
>> please correct me if my understand is wrong.
>>
>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>> performance?
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Vadim Semenov
All used cores aren't getting reported correctly in EMR, and YARN itself
has no control over it, so whatever you put in `spark.executor.cores` will
be used,
but in the ResourceManager you will only see 1 vcore used per nodemanager.

On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman  wrote:

> Hi,
>
> spark version - 2.0.0
> spark distribution - EMR 5.0.0
>
> Spark Cluster - one master, 5 slaves
>
> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>
>
> Cluster Metrics
> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
> NodesRebooted
> Nodes
> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>  0
>  0
>  5
>  0
>  0
> 
> I have submitted job with below configuration
> --num-executors 5 --executor-cores 10 --executor-memory 20g
>
>
>
> spark.task.cpus - be default 1
>
>
> My understanding is there will be 5 executore each can run 10 task at a
> time and task can share total memory of 20g. Here, i could see only 5
> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
> 10 core(number of threads), 1 Vcore(cpu).
>
> please correct me if my understand is wrong.
>
> how can i utilize number of vcore in EMR effectively. Will Vcore boost
> performance?
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Selvam Raman
Hi Fawze,

Yes, it is true that i am running in yarn mode, 5 containers represents
4executor and 1 master.
But i am not expecting this details as i already aware of this. What i want
to know is relationship between Vcores(Emr yarn) vs executor-core(Spark).


>From my slave configuration i understand that only 8 thread available in my
slave machine which means 8 thread run at a time at max.

Thread(s) per core:8
Core(s) per socket:1
Socket(s): 1


so i don't think so it is valid to give executore-core-10 in my
spark-submission.

On Mon, Feb 26, 2018 at 10:54 AM, Fawze Abujaber  wrote:

> It's recommended to sue executor-cores of 5.
>
> Each executor here will utilize 20 GB which mean the spark job will
> utilize 50 cpu cores and 100GB memory.
>
> You can not run more than 4 executors because your cluster doesn't have
> enough memory.
>
> Use see 5 executor because 4 for the job and one for the application
> master.
>
> serr the used menory and the total memory.
>
> On Mon, Feb 26, 2018 at 12:20 PM, Selvam Raman  wrote:
>
>> Hi,
>>
>> spark version - 2.0.0
>> spark distribution - EMR 5.0.0
>>
>> Spark Cluster - one master, 5 slaves
>>
>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>
>>
>> Cluster Metrics
>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
>> NodesRebooted
>> Nodes
>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>  0
>>  0
>>  5
>>  0
>>  0
>> 
>> I have submitted job with below configuration
>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>
>>
>>
>> spark.task.cpus - be default 1
>>
>>
>> My understanding is there will be 5 executore each can run 10 task at a
>> time and task can share total memory of 20g. Here, i could see only 5
>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>> 10 core(number of threads), 1 Vcore(cpu).
>>
>> please correct me if my understand is wrong.
>>
>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>> performance?
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Fawze Abujaber
It's recommended to sue executor-cores of 5.

Each executor here will utilize 20 GB which mean the spark job will utilize
50 cpu cores and 100GB memory.

You can not run more than 4 executors because your cluster doesn't have
enough memory.

Use see 5 executor because 4 for the job and one for the application master.

serr the used menory and the total memory.

On Mon, Feb 26, 2018 at 12:20 PM, Selvam Raman  wrote:

> Hi,
>
> spark version - 2.0.0
> spark distribution - EMR 5.0.0
>
> Spark Cluster - one master, 5 slaves
>
> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>
>
> Cluster Metrics
> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
> NodesRebooted
> Nodes
> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>  0
>  0
>  5
>  0
>  0
> 
> I have submitted job with below configuration
> --num-executors 5 --executor-cores 10 --executor-memory 20g
>
>
>
> spark.task.cpus - be default 1
>
>
> My understanding is there will be 5 executore each can run 10 task at a
> time and task can share total memory of 20g. Here, i could see only 5
> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
> 10 core(number of threads), 1 Vcore(cpu).
>
> please correct me if my understand is wrong.
>
> how can i utilize number of vcore in EMR effectively. Will Vcore boost
> performance?
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>


Re: Spark EMR executor-core vs Vcores

2018-02-26 Thread Selvam Raman
Master Node details:
lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:4
Core(s) per socket:1
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 62
Model name:Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping:  4
CPU MHz:   2494.066
BogoMIPS:  4988.13
Hypervisor vendor: Xen
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  25600K
NUMA node0 CPU(s): 0-3




Slave Node Details:
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 62
Model name:Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Stepping:  4
CPU MHz:   2500.054
BogoMIPS:  5000.10
Hypervisor vendor: Xen
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  25600K
NUMA node0 CPU(s): 0-7

On Mon, Feb 26, 2018 at 10:20 AM, Selvam Raman  wrote:

> Hi,
>
> spark version - 2.0.0
> spark distribution - EMR 5.0.0
>
> Spark Cluster - one master, 5 slaves
>
> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>
>
> Cluster Metrics
> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy 
> NodesRebooted
> Nodes
> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>  0
>  0
>  5
>  0
>  0
> 
> I have submitted job with below configuration
> --num-executors 5 --executor-cores 10 --executor-memory 20g
>
>
>
> spark.task.cpus - be default 1
>
>
> My understanding is there will be 5 executore each can run 10 task at a
> time and task can share total memory of 20g. Here, i could see only 5
> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
> 10 core(number of threads), 1 Vcore(cpu).
>
> please correct me if my understand is wrong.
>
> how can i utilize number of vcore in EMR effectively. Will Vcore boost
> performance?
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"