+1 AFAIK,
vCores are not the same as Cores in AWS. https://samrueby.com/2015/01/12/what-are-amazon-aws-vcpus/ I’ve always understood it as cores = num concurrent threads These posts might help you with your research and why exceeding 5 cores per executor doesn’t make sense. https://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/ AWS/ EMR was always a challenge for me. Never understood why it didn’t seem to be using all my resources; as you noted. I would see this as –num-executors = 15 –executor-cores= 5 –executor-memory = 10gb and then test my application from there. I only got better performance out of a different class of nodes, e.g. R-series instance types. Costs more than the M class; but wound up using less of them and my jobs ran faster. I was in the 10+TB jobs territory with TPC data. ☺ The links I provided have a few use cases and trials. Hope that helps, -Pat From: Selvam Raman <sel...@gmail.com> Date: Monday, February 26, 2018 at 1:52 PM To: Vadim Semenov <va...@datadoghq.com> Cc: user <user@spark.apache.org> Subject: Re: Spark EMR executor-core vs Vcores Thanks. That’s make sense. I want to know one more think , available vcore per machine is 16 but threads per node 8. Am I missing to relate here. What I m thinking now is number of vote = number of threads. On Mon, 26 Feb 2018 at 18:45, Vadim Semenov <va...@datadoghq.com<mailto:va...@datadoghq.com>> wrote: All used cores aren't getting reported correctly in EMR, and YARN itself has no control over it, so whatever you put in `spark.executor.cores` will be used, but in the ResourceManager you will only see 1 vcore used per nodemanager. On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman <sel...@gmail.com<mailto:sel...@gmail.com>> wrote: Hi, spark version - 2.0.0 spark distribution - EMR 5.0.0 Spark Cluster - one master, 5 slaves Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage Cluster Metrics Apps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved Active Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5<http://localhost:8088/cluster/nodes> 0<http://localhost:8088/cluster/nodes/decommissioning> 0<http://localhost:8088/cluster/nodes/decommissioned> 5<http://localhost:8088/cluster/nodes/lost> 0<http://localhost:8088/cluster/nodes/unhealthy> 0<http://localhost:8088/cluster/nodes/rebooted> I have submitted job with below configuration --num-executors 5 --executor-cores 10 --executor-memory 20g spark.task.cpus - be default 1 My understanding is there will be 5 executore each can run 10 task at a time and task can share total memory of 20g. Here, i could see only 5 vcores used which means 1 executor instance use 20g+10%overhead ram(22gb), 10 core(number of threads), 1 Vcore(cpu). please correct me if my understand is wrong. how can i utilize number of vcore in EMR effectively. Will Vcore boost performance? -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"