Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21758#discussion_r205318258
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
         // of locality levels so that it gets a chance to launch local tasks 
on all of them.
         // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
         for (taskSet <- sortedTaskSets) {
    -      var launchedAnyTask = false
    -      var launchedTaskAtCurrentMaxLocality = false
    -      for (currentMaxLocality <- taskSet.myLocalityLevels) {
    -        do {
    -          launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
    -            taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
    -          launchedAnyTask |= launchedTaskAtCurrentMaxLocality
    -        } while (launchedTaskAtCurrentMaxLocality)
    -      }
    -      if (!launchedAnyTask) {
    -        taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
    +      // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
    +      if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
    --- End diff --
    
    We plan to fail the job on submit if it requires more slots than available. 
Are there other scenarios we shall fail fast with dynamic allocation? IIUC the 
barrier tasks that have not get launched are still counted into the number of 
pending tasks, so dynamic resource allocation shall still be able to compute a 
correct expected number of executors.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to