[ 
https://issues.apache.org/jira/browse/SPARK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-37528:
------------------------------
    Description: 
In general, the larger input data size means longer running time. So ideally, 
we can let DAGScheduler submit bigger input size task first. It can reduce the 
whole stage running time. For example, we have one stage with 4 tasks and the 
defaultParallelism is 2 and the 4 tasks have different running time [1s, 3s, 
2s, 4s].
- in normal, the running time of the stage is: 7s
- if big task first, the running time of the stage is: 5s


  was:
Reorder tasks by input size can save the whole stage execution time. Assume the 
larger amount of input data takes longer to execute. Let's say we have one 
stage with 4 tasks and the `defaultParallelism` is 2 and the 4 tasks have 
differnt execution time with [1s, 3s, 2s, 4s].
 * in normal the execution time of the stage is: 7s
 * after reorder the tasks, the execution time of the stage is: 5s

a new config `spark.scheduler.reorderTasks.enabled` to decide if we allow to 
reorder tasks.

 


> Support reorder tasks during scheduling by shuffle partition size in AQE
> ------------------------------------------------------------------------
>
>                 Key: SPARK-37528
>                 URL: https://issues.apache.org/jira/browse/SPARK-37528
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>    Affects Versions: 3.3.0
>            Reporter: XiDuo You
>            Priority: Major
>
> In general, the larger input data size means longer running time. So ideally, 
> we can let DAGScheduler submit bigger input size task first. It can reduce 
> the whole stage running time. For example, we have one stage with 4 tasks and 
> the defaultParallelism is 2 and the 4 tasks have different running time [1s, 
> 3s, 2s, 4s].
> - in normal, the running time of the stage is: 7s
> - if big task first, the running time of the stage is: 5s



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to