Re: [Spark Launcher] How to launch parallel jobs?

Cosmin Posteuca Mon, 13 Feb 2017 23:54:12 -0800

Hi Egor,

About the first problem i think you are right, it's make sense.


About the second problem, i check available resource on 8088 port and there
show 16 available cores. I start my job with 4 executors with 1 core each,
and 1gb per executor. My job use maximum 50mb of memory(just for test).
>From my point of view the resources are enough, and the problem i think is
from yarn configuration files, but i don't know what is missing.

Thank you

2017-02-13 21:14 GMT+02:00 Egor Pahomov <pahomov.e...@gmail.com>:

> About second problem: I understand this can be in two cases: when one job
> prevents the other one from getting resources for executors or (2)
> bottleneck is reading from disk, so you can not really parallel that. I
> have no experience with second case, but it's easy to verify the fist one:
> just look on you hadoop UI and verify, that both job get enough resources.
>
> 2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>:
>
>> "But if i increase only executor-cores the finish time is the same".
>> More experienced ones can correct me, if I'm wrong, but as far as I
>> understand that: one partition processed by one spark task. Task is always
>> running on 1 core and not parallelized among cores. So if you have 5
>> partitions and you increased totall number of cores among cluster from 7 to
>> 10 for example - you have not gained anything. But if you repartition you
>> give an opportunity to process thing in more threads, so now more tasks can
>> execute in parallel.
>>
>> 2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com>:
>>
>>> Hi,
>>>
>>> I think i don't understand enough how to launch jobs.
>>>
>>> I have one job which takes 60 seconds to finish. I run it with following
>>> command:
>>>
>>> spark-submit --executor-cores 1 \
>>>              --executor-memory 1g \
>>>              --driver-memory 1g \
>>>              --master yarn \
>>>              --deploy-mode cluster \
>>>              --conf spark.dynamicAllocation.enabled=true \
>>>              --conf spark.shuffle.service.enabled=true \
>>>              --conf spark.dynamicAllocation.minExecutors=1 \
>>>              --conf spark.dynamicAllocation.maxExecutors=4 \
>>>              --conf spark.dynamicAllocation.initialExecutors=4 \
>>>              --conf spark.executor.instances=4 \
>>>
>>> If i increase number of partitions from code and number of executors the 
>>> app will finish faster, which it's ok. But if i increase only 
>>> executor-cores the finish time is the same, and i don't understand why. I 
>>> expect the time to be lower than initial time.
>>>
>>> My second problem is if i launch twice above code i expect that both jobs 
>>> to finish in 60 seconds, but this don't happen. Both jobs finish after 120 
>>> seconds and i don't understand why.
>>>
>>> I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 
>>> threads). From what i saw in default EMR configurations, yarn is set on 
>>> FIFO(default) mode with CapacityScheduler.
>>>
>>> What do you think about this problems?
>>>
>>> Thanks,
>>>
>>> Cosmin
>>>
>>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>

Re: [Spark Launcher] How to launch parallel jobs?

Reply via email to