[ https://issues.apache.org/jira/browse/SPARK-18769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888430#comment-15888430 ]
Thomas Graves commented on SPARK-18769: --------------------------------------- {quote} A little clarification in case the summary is not clear: Spark's dynamic allocation will keep growing the number of requested executors until it reaches the upper limit, even when the cluster manager hasn't really been allocating new executors. This is sub-optimal, especially since, I believe, this increases the memory usage in Spark unnecessarily, and might also put unnecessary burden on the cluster manager. {quote} What do you mean by this? Spark can change its asks to yarn at any time, this doesn't affect its actual usage of resources until things get allocated by the yarn resourcemanager. As far as more pressure on the cluster manager, that should only happen if we are increasing our heartbeats or interval at which we ask for resources, which does happen more frequently as we ask for more containers but it backs off pretty quickly. It does leave one more application the yarn schedulers list to look at but if Spark could actually use more containers I don't see a problem with this. If we are purely asking for more resources then we can use then we definitely shouldn't do it. I know there were a few issues with older versions of hadoop, I thought they were more around releasing containers but I'll have to double check on that. I think it would be good to be a little smarter here but at the same time if you don't request things fast enough you just make things slower. Yarn can handle you requesting more resources then it can give you as it handles all the queue limits and such and if space does free up then you would get more resources. What is the real problem here that we are trying to solve or what are we trying to enhance? I do definitely see issues with yarn giving spark many containers and spark just taking a long time to bring them up and utilize them. This definitely wastes resources. Spark obviously has an issue with event processing and synchronization which I think causes some of that. I haven't had time to investigate it further but if you have it would be great to hear what you have found. I also think its a bit weird the way we ramp up containers. I'm not sure why we don't just ask for the number we need immediately. This would reduce the number of changes to asks and number of messages flying around. I know we ramp up pretty quickly but again that might just be adding overhead.jk > Spark to be smarter about what the upper bound is and to restrict number of > executor when dynamic allocation is enabled > ------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-18769 > URL: https://issues.apache.org/jira/browse/SPARK-18769 > Project: Spark > Issue Type: New Feature > Reporter: Neerja Khattar > > Currently when dynamic allocation is enabled max.executor is infinite and > spark creates so many executor and even exceed the yarn nodemanager memory > limit and vcores. > It should have a check to not exceed more that yarn resource limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org