Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13473499 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -54,8 +54,15 @@ private[spark] class TaskSetManager( clock: Clock = SystemClock) extends Schedulable with Logging { + // Remember when this TaskSetManager is created + val creationTime = clock.getTime() val conf = sched.sc.conf + // The period we wait for new executors to come up + // After this period, tasks in pendingTasksWithNoPrefs will be considered as PROCESS_LOCAL + private val WAIT_NEW_EXEC_TIMEOUT = conf.getLong("spark.scheduler.waitNewExecutorTime", 3000L) --- End diff -- This waiting period is only intended for pendingTasksWithNoPrefs. Suppose pendingTasksWithNoPrefs contains tasks whose preference is unavailable. Within this waiting period, we want to try pendingTasksForExecutor, pendingTasksForHost and pendingTasksForRack first because tasks in these lists do have some locality. And when an executor is added, we remove tasks newly have locality from pendingTasksWithNoPrefs. Then after the waiting period, we believe no executor will come for tasks still remain in pendingTasksWithNoPrefs. So they can be shceduled as PROCESS_LOCAL. You can see tasks in pendingTasksForHost can still get scheduled even within the period. We're just holding back on pendingTasksWithNoPrefs. I think it's better than holding back the whole application and schedule nothing.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---