Hi Hiroyuki, thanks for the answer.

I found a solution for the cores per executor configuration:
I set this configuration to true:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#emr-spark-maximizeresourceallocation
Probably it was true by default at version 5.16, but I didn't find when it
has changed.
In the same link, it says that dynamic allocation is true by default. I
thought it would do the trick but reading again I think it is related to
the number of executors rather than the number of cores.

But the jobs are still taking more than before.
Watching application history,  I see these differences:
For the same job, the same kind of instances types, default (aws managed)
configuration for executors, cores, and memory:
Instances:
6 r5.xlarge :  4 vCpu , 32gb of mem. (So there is 24 cores: 6 instances * 4
cores).

With 5.16:
- 24 executors  (4 in each instance, including the one who also had the
driver).
- 4 cores each.
- 2.7  * 2 (Storage + on-heap storage) memory each.
- 1 executor per core, but at the same time  4 cores per executor (?).
- Total Mem in executors per Instance : 21.6 (2.7 * 2 * 4)
- Total Elapsed Time: 6 minutes
With 5.20:
- 5 executors (1 in each instance, 0 in the instance with the driver).
- 4 cores each.
- 11.9  * 2 (Storage + on-heap storage) memory each.
- Total Mem  in executors per Instance : 23.8 (11.9 * 2 * 1)
- Total Elapsed Time: 8 minutes


I don't understand the configuration of 5.16, but it works better.
It seems that in 5.20, a full instance is wasted with the driver only,
while it could also contain an executor.


Regards,
Pedro.



l jue., 31 de ene. de 2019 20:16, Hiroyuki Nagata <idiotpan...@gmail.com>
escribió:

> Hi, Pedro
>
>
> I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for
> performance tuning.
>
> Do you configure dynamic allocation ?
>
> FYI:
>
> https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
> I've not tested it yet. I guess spark-submit needs to specify number of
> executors.
>
> Regards,
> Hiroyuki
>
> 2019年2月1日(金) 5:23、Pedro Tuero さん(tuerope...@gmail.com)のメッセージ:
>
>> Hi guys,
>> I use to run spark jobs in Aws emr.
>> Recently I switch from aws emr label  5.16 to 5.20 (which use Spark
>> 2.4.0).
>> I've noticed that a lot of steps are taking longer than before.
>> I think it is related to the automatic configuration of cores by executor.
>> In version 5.16, some executors toke more cores if the instance allows it.
>> Let say, if an instance had 8 cores and 40gb of ram, and ram configured
>> by executor was 10gb, then aws emr automatically assigned 2 cores by
>> executor.
>> Now in label 5.20, unless I configure the number of cores manually, only
>> one core is assigned per executor.
>>
>> I don't know if it is related to Spark 2.4.0 or if it is something
>> managed by aws...
>> Does anyone know if there is a way to automatically use more cores when
>> it is physically possible?
>>
>> Thanks,
>> Peter.
>>
>

Reply via email to