[ 
https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956245#comment-15956245
 ] 

Apache Spark commented on SPARK-20219:
--------------------------------------

User 'jinxing64' has created a pull request for this issue:
https://github.com/apache/spark/pull/17533

> Schedule tasks based on size of input from ScheduledRDD
> -------------------------------------------------------
>
>                 Key: SPARK-20219
>                 URL: https://issues.apache.org/jira/browse/SPARK-20219
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>
> When data is highly skewed on ShuffledRDD, it make sense to launch those 
> tasks which process much more input as soon as possible. The current 
> scheduling mechanism in *TaskSetManager* is quite simple:
> {code}
>   for (i <- (0 until numTasks).reverse) {
>     addPendingTask(i)
>   }
> {code}
> In scenario that "large tasks" locate at bottom half of tasks array, if tasks 
> with much more input are launched early, we can significantly reduce the time 
> cost and save resource when *"dynamic allocation"* is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to