Github user lirui-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/892#discussion_r13473499
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
         clock: Clock = SystemClock)
       extends Schedulable with Logging
     {
    +  // Remember when this TaskSetManager is created
    +  val creationTime = clock.getTime()
       val conf = sched.sc.conf
     
    +  // The period we wait for new executors to come up
    +  // After this period, tasks in pendingTasksWithNoPrefs will be 
considered as PROCESS_LOCAL
    +  private val WAIT_NEW_EXEC_TIMEOUT = 
conf.getLong("spark.scheduler.waitNewExecutorTime", 3000L)
    --- End diff --
    
    This waiting period is only intended for pendingTasksWithNoPrefs. Suppose 
pendingTasksWithNoPrefs contains tasks whose preference is unavailable. Within 
this waiting period, we want to try pendingTasksForExecutor, 
pendingTasksForHost and pendingTasksForRack first because tasks in these lists 
do have some locality. And when an executor is added, we remove tasks newly 
have locality from pendingTasksWithNoPrefs. Then after the waiting period, we 
believe no executor will come for tasks still remain in 
pendingTasksWithNoPrefs. So they can be shceduled as PROCESS_LOCAL.
    You can see tasks in pendingTasksForHost can still get scheduled even 
within the period. We're just holding back on pendingTasksWithNoPrefs. I think 
it's better than holding back the whole application and schedule nothing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to