jin xing created SPARK-20219:
--------------------------------

             Summary: Schedule tasks based on size of input from ScheduledRDD
                 Key: SPARK-20219
                 URL: https://issues.apache.org/jira/browse/SPARK-20219
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.1.0
            Reporter: jin xing


When data is highly skewed on ShuffledRDD, it make sense to launch those tasks 
which process much more input as soon as possible. The current scheduling 
mechanism in *TaskSetManager* is quite simple:
{code}
  for (i <- (0 until numTasks).reverse) {
    addPendingTask(i)
  }
{code}
In scenario that "large tasks" locate at bottom half of tasks array, if tasks 
with much more input are launched early, we can significantly reduce the time 
cost and save resource when *"dynamic allocation"* is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to