Hi Egor, About the first problem i think you are right, it's make sense.
About the second problem, i check available resource on 8088 port and there show 16 available cores. I start my job with 4 executors with 1 core each, and 1gb per executor. My job use maximum 50mb of memory(just for test). >From my point of view the resources are enough, and the problem i think is from yarn configuration files, but i don't know what is missing. Thank you 2017-02-13 21:14 GMT+02:00 Egor Pahomov <pahomov.e...@gmail.com>: > About second problem: I understand this can be in two cases: when one job > prevents the other one from getting resources for executors or (2) > bottleneck is reading from disk, so you can not really parallel that. I > have no experience with second case, but it's easy to verify the fist one: > just look on you hadoop UI and verify, that both job get enough resources. > > 2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>: > >> "But if i increase only executor-cores the finish time is the same". >> More experienced ones can correct me, if I'm wrong, but as far as I >> understand that: one partition processed by one spark task. Task is always >> running on 1 core and not parallelized among cores. So if you have 5 >> partitions and you increased totall number of cores among cluster from 7 to >> 10 for example - you have not gained anything. But if you repartition you >> give an opportunity to process thing in more threads, so now more tasks can >> execute in parallel. >> >> 2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com>: >> >>> Hi, >>> >>> I think i don't understand enough how to launch jobs. >>> >>> I have one job which takes 60 seconds to finish. I run it with following >>> command: >>> >>> spark-submit --executor-cores 1 \ >>> --executor-memory 1g \ >>> --driver-memory 1g \ >>> --master yarn \ >>> --deploy-mode cluster \ >>> --conf spark.dynamicAllocation.enabled=true \ >>> --conf spark.shuffle.service.enabled=true \ >>> --conf spark.dynamicAllocation.minExecutors=1 \ >>> --conf spark.dynamicAllocation.maxExecutors=4 \ >>> --conf spark.dynamicAllocation.initialExecutors=4 \ >>> --conf spark.executor.instances=4 \ >>> >>> If i increase number of partitions from code and number of executors the >>> app will finish faster, which it's ok. But if i increase only >>> executor-cores the finish time is the same, and i don't understand why. I >>> expect the time to be lower than initial time. >>> >>> My second problem is if i launch twice above code i expect that both jobs >>> to finish in 60 seconds, but this don't happen. Both jobs finish after 120 >>> seconds and i don't understand why. >>> >>> I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 >>> threads). From what i saw in default EMR configurations, yarn is set on >>> FIFO(default) mode with CapacityScheduler. >>> >>> What do you think about this problems? >>> >>> Thanks, >>> >>> Cosmin >>> >>> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* >