Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21758#discussion_r205660568
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
         // of locality levels so that it gets a chance to launch local tasks 
on all of them.
         // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
         for (taskSet <- sortedTaskSets) {
    -      var launchedAnyTask = false
    -      var launchedTaskAtCurrentMaxLocality = false
    -      for (currentMaxLocality <- taskSet.myLocalityLevels) {
    -        do {
    -          launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
    -            taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
    -          launchedAnyTask |= launchedTaskAtCurrentMaxLocality
    -        } while (launchedTaskAtCurrentMaxLocality)
    -      }
    -      if (!launchedAnyTask) {
    -        taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
    +      // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
    +      if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
    --- End diff --
    
    Yea you made really good point here, I've opened 
https://issues.apache.org/jira/browse/SPARK-24942 to track the cluster resource 
management issue.
    
    > what exactly do you mean by "available"? Its not so well defined for 
dynamic allocation. The resources you have right when the job is submitted? 
Also can you point me to where that is being done? I didn't see it here -- is 
it another jira?
    
    This is tracked by https://issues.apache.org/jira/browse/SPARK-24819, we 
shall check all the barrier stages on job submitted, to see whether the barrier 
stages require more slots (to be able to launch all the barrier tasks in the 
same stage together) than currently active slots in the cluster. If the job 
requires more slots than available (both busy and free slots), fail the job on 
submit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to