[ https://issues.apache.org/jira/browse/SPARK-18769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892318#comment-15892318 ]
Thomas Graves commented on SPARK-18769: --------------------------------------- I definitely understand there is an actual problem here, but I think the problem is more with Spark and its event processing/synchronization then the fact we are asking for more containers. Like I mention I agree with doing the jira I just want to clarify why we are doing it and make sure we do it such that it doesn't hurt our container allocation. Its always good to play nice in the yarn environment and not ask for more containers then the entire cluster can handle for instance, but at the same time if we are limiting the container requests early on, yarn could easily free up resource and make them available for you. If you don't have your request in yarn could give those to someone else. There are a lot of configs in the yarn schedulers and different situations. If you look at some other apps on yarn (MR and TEZ), both immediately ask for all of their resource. MR is definitely different since it doesn't reuse containers, TEZ does. With asking for everything immediately you can definitely hit issues where if your tasks run really fast then you don't need all of those containers, but the exponential ramp up on our allocation now gets you their really quickly anyway and I think you can hit the same issue. Note that in our clusters we set the upper limit by default to something reasonable (couple thousand) and if someone has really large job they can reconfigure. > Spark to be smarter about what the upper bound is and to restrict number of > executor when dynamic allocation is enabled > ------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-18769 > URL: https://issues.apache.org/jira/browse/SPARK-18769 > Project: Spark > Issue Type: New Feature > Reporter: Neerja Khattar > > Currently when dynamic allocation is enabled max.executor is infinite and > spark creates so many executor and even exceed the yarn nodemanager memory > limit and vcores. > It should have a check to not exceed more that yarn resource limit. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org