Github user lianhuiwang commented on the pull request:

    https://github.com/apache/spark/pull/4051#issuecomment-71362007
  
    @sryza @andrewor14 i find that setting minExecutors to initialExecutors  is 
best for the following situation: when DAGScheduler submits missing tasks of 
first stage very lately because DAGScheduler  need to scan large hdfs path in 
order to get rdd's partitions. in my test, some jobs cost more than two minutes 
on scanning hdfs. so in this situation, if we set a large number to 
initialExecutors, before DAGScheduler submit missing tasks, these resources of 
initialExecutors were wasted. if the time > executorIdleTimeout, 
initialExecutors  will be removed since they are executorIdleTimeout.
    and we can reduce schedulerBacklogTimeout to make numExecutors reach to 
maxExecutors quickly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to