[ 
https://issues.apache.org/jira/browse/SPARK-18769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888430#comment-15888430
 ] 

Thomas Graves commented on SPARK-18769:
---------------------------------------

{quote}
A little clarification in case the summary is not clear: Spark's dynamic 
allocation will keep growing the number of requested executors until it reaches 
the upper limit, even when the cluster manager hasn't really been allocating 
new executors. This is sub-optimal, especially since, I believe, this increases 
the memory usage in Spark unnecessarily, and might also put unnecessary burden 
on the cluster manager.
{quote}

What do you mean by this?  Spark can change its asks to yarn at any time, this 
doesn't affect its actual usage of resources until things get allocated by the 
yarn resourcemanager.   As far as more pressure on the cluster manager, that 
should only happen if we are increasing our heartbeats or interval at which we 
ask for resources, which does happen more frequently as we ask for more 
containers but it backs off pretty quickly.  It does leave one more application 
the yarn schedulers list to look at but if Spark could actually use more 
containers I don't see a problem with this.  If we are purely asking for more 
resources then we can use then we definitely shouldn't do it.   I know there 
were a few issues with older versions of hadoop, I thought they were more 
around releasing containers but I'll have to double check on that.

I think it would be good to be a little smarter here but at the same time if 
you don't request things fast enough you just make things slower.  Yarn can 
handle you requesting more resources then it can give you as it handles all the 
queue limits and such and if space does free up then you would get more 
resources. 

What is the real problem here that we are trying to solve or what are we trying 
to enhance?   I do definitely see issues with yarn giving spark many containers 
and spark just taking a long time to bring them up and utilize them.  This 
definitely wastes resources.  Spark obviously has an issue with event 
processing and synchronization which I think causes some of that.    I haven't 
had time to investigate it further but if you have it would be great to hear 
what you have found.   

I also think its a bit weird the way we ramp up containers.  I'm not sure why 
we don't just ask for the number we need immediately.  This would reduce the 
number of changes to asks and number of messages flying around.  I know we ramp 
up pretty quickly but again that might just be adding overhead.jk

>  Spark to be smarter about what the upper bound is and to restrict number of 
> executor when dynamic allocation is enabled
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18769
>                 URL: https://issues.apache.org/jira/browse/SPARK-18769
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Neerja Khattar
>
> Currently when dynamic allocation is enabled max.executor is infinite and 
> spark creates so many executor and even exceed the yarn nodemanager memory 
> limit and vcores.
> It should have a check to not exceed more that yarn resource limit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to