+1

AFAIK,

vCores are not the same as Cores in AWS. 
https://samrueby.com/2015/01/12/what-are-amazon-aws-vcpus/

I’ve always understood it as cores = num concurrent threads

These posts might help you with your research and why exceeding 5 cores per 
executor doesn’t make sense.

https://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors
http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

AWS/ EMR was always a challenge for me. Never understood why it didn’t seem to 
be using all my resources; as you noted.

I would see this as –num-executors = 15 –executor-cores= 5 –executor-memory = 
10gb and then test my application from there.

I only got better performance out of a different class of nodes, e.g. R-series 
instance types. Costs more than the M class; but wound up using less of them 
and my jobs ran faster. I was in the 10+TB jobs territory with TPC data.  ☺ The 
links I provided have a few use cases and trials.

Hope that helps,

-Pat


From: Selvam Raman <sel...@gmail.com>
Date: Monday, February 26, 2018 at 1:52 PM
To: Vadim Semenov <va...@datadoghq.com>
Cc: user <user@spark.apache.org>
Subject: Re: Spark EMR executor-core vs Vcores

Thanks. That’s make sense.

I want to know one more think , available vcore per machine is 16 but threads 
per node 8. Am I missing to relate here.

What I m thinking now is number of vote = number of threads.



On Mon, 26 Feb 2018 at 18:45, Vadim Semenov 
<va...@datadoghq.com<mailto:va...@datadoghq.com>> wrote:
All used cores aren't getting reported correctly in EMR, and YARN itself has no 
control over it, so whatever you put in `spark.executor.cores` will be used,
but in the ResourceManager you will only see 1 vcore used per nodemanager.

On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman 
<sel...@gmail.com<mailto:sel...@gmail.com>> wrote:
Hi,

spark version - 2.0.0
spark distribution - EMR 5.0.0

Spark Cluster - one master, 5 slaves
Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage



Cluster Metrics
Apps Submitted

Apps Pending

Apps Running

Apps Completed

Containers Running

Memory Used

Memory Total

Memory Reserved

VCores Used

VCores Total

VCores Reserved

Active Nodes

Decommissioning Nodes

Decommissioned Nodes

Lost Nodes

Unhealthy Nodes

Rebooted Nodes

16

0

1

15

5

88.88 GB

90.50 GB

22 GB

5

79

1

5<http://localhost:8088/cluster/nodes>

0<http://localhost:8088/cluster/nodes/decommissioning>

0<http://localhost:8088/cluster/nodes/decommissioned>

5<http://localhost:8088/cluster/nodes/lost>

0<http://localhost:8088/cluster/nodes/unhealthy>

0<http://localhost:8088/cluster/nodes/rebooted>


I have submitted job with below configuration
--num-executors 5 --executor-cores 10 --executor-memory 20g







spark.task.cpus - be default 1


My understanding is there will be 5 executore each can run 10 task at a time 
and task can share total memory of 20g. Here, i could see only 5 vcores used 
which means 1 executor instance use 20g+10%overhead ram(22gb), 10 core(number 
of threads), 1 Vcore(cpu).

please correct me if my understand is wrong.


how can i utilize number of vcore in EMR effectively. Will Vcore boost 
performance?



--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Reply via email to