[ https://issues.apache.org/jira/browse/SPARK-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-23974. ------------------------------------ Resolution: Not A Problem Closing based on the above comment. > Do not allocate more containers as expected in dynamic allocation > ----------------------------------------------------------------- > > Key: SPARK-23974 > URL: https://issues.apache.org/jira/browse/SPARK-23974 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.1 > Reporter: Darcy Shen > Priority: Major > > Using Yarn with dynamic allocation enabled, spark does not allocate more > containers when current containers(executors) number is less than the max > executor num. > For example, we only have 7 executors working, while our cluster is not busy, > and I have set > {\{ spark.dynamicAllocation.maxExecutors = 600}} > {{and the current jobs of the context are executed slowly.}} > > A live case with online logs: > ``` > $ grep "Not adding executors because our current target total" > spark-job-server.log.9 | tail > [2018-04-12 16:07:19,070] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:20,071] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:21,072] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:22,073] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:23,074] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:24,075] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:25,076] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:26,077] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:27,078] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 16:07:28,079] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > $ grep "Not adding executors because our current target total" > spark-job-server.log.9 | head > [2018-04-12 13:52:18,067] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:19,071] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:20,072] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:21,073] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:22,074] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:23,075] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:24,076] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:25,077] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:26,078] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > [2018-04-12 13:52:27,079] DEBUG .ExecutorAllocationManager [] > [akka://JobServer/user/jobManager] - Not adding executors because our current > target total is already 600 (limit 600) > $ grep "Not adding executors because our current target total" > spark-job-server.log.9 | wc -l > 8111 > ``` > The logs mean that we are keeping the `numExecutorsTarget == maxNumExecutors > == 600` without requesting new executors. And at that time, we only have 7 > executors available for our users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org