Hi,

I think i don't understand enough how to launch jobs.

I have one job which takes 60 seconds to finish. I run it with following
command:

spark-submit --executor-cores 1 \
             --executor-memory 1g \
             --driver-memory 1g \
             --master yarn \
             --deploy-mode cluster \
             --conf spark.dynamicAllocation.enabled=true \
             --conf spark.shuffle.service.enabled=true \
             --conf spark.dynamicAllocation.minExecutors=1 \
             --conf spark.dynamicAllocation.maxExecutors=4 \
             --conf spark.dynamicAllocation.initialExecutors=4 \
             --conf spark.executor.instances=4 \

If i increase number of partitions from code and number of executors
the app will finish faster, which it's ok. But if i increase only
executor-cores the finish time is the same, and i don't understand
why. I expect the time to be lower than initial time.

My second problem is if i launch twice above code i expect that both
jobs to finish in 60 seconds, but this don't happen. Both jobs finish
after 120 seconds and i don't understand why.

I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu
has 2 threads). From what i saw in default EMR configurations, yarn is
set on FIFO(default) mode with CapacityScheduler.

What do you think about this problems?

Thanks,

Cosmin

Reply via email to