jin xing created SPARK-20219: -------------------------------- Summary: Schedule tasks based on size of input from ScheduledRDD Key: SPARK-20219 URL: https://issues.apache.org/jira/browse/SPARK-20219 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.1.0 Reporter: jin xing
When data is highly skewed on ShuffledRDD, it make sense to launch those tasks which process much more input as soon as possible. The current scheduling mechanism in *TaskSetManager* is quite simple: {code} for (i <- (0 until numTasks).reverse) { addPendingTask(i) } {code} In scenario that "large tasks" locate at bottom half of tasks array, if tasks with much more input are launched early, we can significantly reduce the time cost and save resource when *"dynamic allocation"* is disabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org