[ 
https://issues.apache.org/jira/browse/SPARK-18769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892318#comment-15892318
 ] 

Thomas Graves commented on SPARK-18769:
---------------------------------------

I definitely understand there is an actual problem here, but I think the 
problem is more with Spark and its event processing/synchronization then the 
fact we are asking for more containers.    Like I mention I agree with doing 
the jira I just want to clarify why we are doing it and make sure we do it such 
that it doesn't hurt our container allocation.  Its always good to play nice in 
the yarn environment and not ask for more containers then the entire cluster 
can handle for instance, but at the same time if we are limiting the container 
requests early on, yarn could easily free up resource and make them available 
for you.  If you don't have your request in yarn could give those to someone 
else.  There are a lot of configs in the yarn schedulers and different 
situations.    If you look at some other apps on yarn (MR and TEZ), both 
immediately ask for all of their resource.  MR is definitely different since it 
doesn't reuse containers, TEZ does. With asking for everything immediately you 
can definitely hit issues where if your tasks run really fast then you don't 
need all of those containers, but the exponential ramp up on our allocation now 
gets you their really quickly anyway and I think you can hit the same issue. 

Note that in our clusters we set the upper limit by default to something 
reasonable (couple thousand) and if someone has really large job they can 
reconfigure.
 

>  Spark to be smarter about what the upper bound is and to restrict number of 
> executor when dynamic allocation is enabled
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18769
>                 URL: https://issues.apache.org/jira/browse/SPARK-18769
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Neerja Khattar
>
> Currently when dynamic allocation is enabled max.executor is infinite and 
> spark creates so many executor and even exceed the yarn nodemanager memory 
> limit and vcores.
> It should have a check to not exceed more that yarn resource limit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to