tgravescs commented on a change in pull request #26696: 
[WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's 
available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#discussion_r356302175
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/Pool.scala
 ##########
 @@ -119,4 +119,28 @@ private[spark] class Pool(
       parent.decreaseRunningTasks(taskNum)
     }
   }
+
+  override def updateAvailableSlots(numSlots: Float): Unit = {
+    schedulingMode match {
+      case SchedulingMode.FAIR =>
+        val usableWeights = schedulableQueue.asScala
+          .map(s => if (s.getSortedTaskSetQueue.nonEmpty) (s, s.weight) else 
(s, 0))
+        val totalWeights = usableWeights.map(_._2).sum
+        usableWeights.foreach({case (schedulable, usableWeight) =>
+          schedulable.updateAvailableSlots(
+            Math.max(numSlots * usableWeight / totalWeights, 
schedulable.minShare))
+        })
+      case SchedulingMode.FIFO =>
+        val sortedSchedulableQueue =
+          
schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator)
+        var isFirst = true
+        for (schedulable <- sortedSchedulableQueue) {
+          schedulable.updateAvailableSlots(if (isFirst) numSlots else 0)
 
 Review comment:
   so something like that is definitely better. I think you have to make sure 
that n tasks is the number of tasks running or pending, because a taskSet could 
have 10000 tasks and if 9999 of them are finished it really only needs 1 and 
the rest should go to the other tasksets.  I'm not sure that is easily 
accessible right now from this location.
   
   It kind of feels to me that FIFO should just have some boolean that is 
passed to the TSM to tell if it there are remaining slots.  with FIFO, if I 
have multiple tasks set, they do go in priority order, but if the higher 
priority one passes slots up there is nothing keeping the other tasksets from 
using it. So I don't know why we should make the locality preference skewed. 
its definitely not any worse then we have now and I'm not sure how you would 
really get away from this without going to something more like the node delay 
vs task delay.
   
   The fair scheduler side I need to think about more. I think it has similar 
issue with the totalSlots vs how many tasks it actually has left to run. For 
instance you have 2 pools with 1 taskset each, both with equal fair shares, the 
first one has run enough where it only has 2 tasks left so do you send the rest 
of the free slots to the second tsm so it can use more of the cluster. I would 
generally say yes and I think that is the way it acts now without this patch.  
The shares for the Fair scheduler just appear to sort the tasksets and they are 
visited in that order but there doesn't appear to be any real hard enforcement 
of the limits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to