About second problem: I understand this can be in two cases: when one job
prevents the other one from getting resources for executors or (2)
bottleneck is reading from disk, so you can not really parallel that. I
have no experience with second case, but it's easy to verify the fist one:
just look on you hadoop UI and verify, that both job get enough resources.

2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com>:

> "But if i increase only executor-cores the finish time is the same". More
> experienced ones can correct me, if I'm wrong, but as far as I understand
> that: one partition processed by one spark task. Task is always running on
> 1 core and not parallelized among cores. So if you have 5 partitions and
> you increased totall number of cores among cluster from 7 to 10 for example
> - you have not gained anything. But if you repartition you give an
> opportunity to process thing in more threads, so now more tasks can execute
> in parallel.
>
> 2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com>:
>
>> Hi,
>>
>> I think i don't understand enough how to launch jobs.
>>
>> I have one job which takes 60 seconds to finish. I run it with following
>> command:
>>
>> spark-submit --executor-cores 1 \
>>              --executor-memory 1g \
>>              --driver-memory 1g \
>>              --master yarn \
>>              --deploy-mode cluster \
>>              --conf spark.dynamicAllocation.enabled=true \
>>              --conf spark.shuffle.service.enabled=true \
>>              --conf spark.dynamicAllocation.minExecutors=1 \
>>              --conf spark.dynamicAllocation.maxExecutors=4 \
>>              --conf spark.dynamicAllocation.initialExecutors=4 \
>>              --conf spark.executor.instances=4 \
>>
>> If i increase number of partitions from code and number of executors the app 
>> will finish faster, which it's ok. But if i increase only executor-cores the 
>> finish time is the same, and i don't understand why. I expect the time to be 
>> lower than initial time.
>>
>> My second problem is if i launch twice above code i expect that both jobs to 
>> finish in 60 seconds, but this don't happen. Both jobs finish after 120 
>> seconds and i don't understand why.
>>
>> I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 
>> threads). From what i saw in default EMR configurations, yarn is set on 
>> FIFO(default) mode with CapacityScheduler.
>>
>> What do you think about this problems?
>>
>> Thanks,
>>
>> Cosmin
>>
>>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 


*Sincerely yoursEgor Pakhomov*

Reply via email to