tgravescs commented on a change in pull request #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#discussion_r356302175
########## File path: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ########## @@ -119,4 +119,28 @@ private[spark] class Pool( parent.decreaseRunningTasks(taskNum) } } + + override def updateAvailableSlots(numSlots: Float): Unit = { + schedulingMode match { + case SchedulingMode.FAIR => + val usableWeights = schedulableQueue.asScala + .map(s => if (s.getSortedTaskSetQueue.nonEmpty) (s, s.weight) else (s, 0)) + val totalWeights = usableWeights.map(_._2).sum + usableWeights.foreach({case (schedulable, usableWeight) => + schedulable.updateAvailableSlots( + Math.max(numSlots * usableWeight / totalWeights, schedulable.minShare)) + }) + case SchedulingMode.FIFO => + val sortedSchedulableQueue = + schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator) + var isFirst = true + for (schedulable <- sortedSchedulableQueue) { + schedulable.updateAvailableSlots(if (isFirst) numSlots else 0) Review comment: so something like that is definitely better. I think you have to make sure that n tasks is the number of tasks running or pending, because a taskSet could have 10000 tasks and if 9999 of them are finished it really only needs 1 and the rest should go to the other tasksets. I'm not sure that is easily accessible right now from this location. It kind of feels to me that FIFO should just have some boolean that is passed to the TSM to tell if it there are remaining slots. with FIFO, if I have multiple tasks set, they do go in priority order, but if the higher priority one passes slots up there is nothing keeping the other tasksets from using it. So I don't know why we should make the locality preference skewed. its definitely not any worse then we have now and I'm not sure how you would really get away from this without going to something more like the node delay vs task delay. The fair scheduler side I need to think about more. I think it has similar issue with the totalSlots vs how many tasks it actually has left to run. For instance you have 2 pools with 1 taskset each, both with equal fair shares, the first one has run enough where it only has 2 tasks left so do you send the rest of the free slots to the second tsm so it can use more of the cluster. I would generally say yes and I think that is the way it acts now without this patch. The shares for the Fair scheduler just appear to sort the tasksets and they are visited in that order but there doesn't appear to be any real hard enforcement of the limits. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org