Josh Rosen created SPARK-19100:
----------------------------------

             Summary: Schedule tasks in descending order of estimated input 
size / estimated task duration
                 Key: SPARK-19100
                 URL: https://issues.apache.org/jira/browse/SPARK-19100
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
            Reporter: Josh Rosen


Say that you're scheduling a reduce phase and based on the map output sizes you 
have identified that some reducers will be skewed due to processing much more 
input. In this case, it's preferable to schedule those larger tasks first: the 
large reduce tasks will dominate the completion time of the stage so it's 
important to launch them as soon as possible.

Spark's current task scheduling technique is naive and simply launches tasks in 
ascending order of their indices: 
https://github.com/apache/spark/blob/903bb8e8a2b84b9ea82acbb8ae9d58754862be3a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L158

If we implement an interface for a Task to expose a relative size estimate 
(comparable only within the same TaskSet) and used that instead of index in the 
initial scheduling then I think we could realize decent stage completion time 
improvements for highly-skewed stages.

This is also beneficial for initial map stages where you have input size 
estimates: if you're generating tasks / splits from HDFS then you have some 
estimate of the total input size of the task and that may be a decent proxy for 
the task's duration.

Basically, my feeling is that scheduling in order of size estimates, even if 
they're slightly inaccurate, must be a strict improvement over scheduling in 
ascending order of task id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to