[ https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jin xing updated SPARK-20219: ----------------------------- Attachment: screenshot-1.png > Schedule tasks based on size of input from ScheduledRDD > ------------------------------------------------------- > > Key: SPARK-20219 > URL: https://issues.apache.org/jira/browse/SPARK-20219 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.0 > Reporter: jin xing > Attachments: screenshot-1.png > > > When data is highly skewed on ShuffledRDD, it make sense to launch those > tasks which process much more input as soon as possible. The current > scheduling mechanism in *TaskSetManager* is quite simple: > {code} > for (i <- (0 until numTasks).reverse) { > addPendingTask(i) > } > {code} > In scenario that "large tasks" locate at bottom half of tasks array, if tasks > with much more input are launched early, we can significantly reduce the time > cost and save resource when *"dynamic allocation"* is disabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org