[ 
https://issues.apache.org/jira/browse/SPARK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-37528:
------------------------------
    Description: 
In general, the larger input data size means longer running time. So ideally, 
we can let DAGScheduler submit bigger input size task first. It can reduce the 
whole stage running time.

[design 
doc](https://docs.google.com/document/d/1vPcuEADUokO4XpqBV1rFH90Zi4rKdsgZtZMYX80c2gw/edit?usp=sharing)

  was:
In general, the larger input data size means longer running time. So ideally, 
we can let DAGScheduler submit bigger input size task first. It can reduce the 
whole stage running time. For example, we have one stage with 4 tasks and the 
defaultParallelism is 2 and the 4 tasks have different running time [1s, 3s, 
2s, 4s].
- in normal, the running time of the stage is: 7s
- if big task first, the running time of the stage is: 5s



> Schedule Tasks By Input Size
> ----------------------------
>
>                 Key: SPARK-37528
>                 URL: https://issues.apache.org/jira/browse/SPARK-37528
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>    Affects Versions: 3.4.0
>            Reporter: XiDuo You
>            Priority: Major
>
> In general, the larger input data size means longer running time. So ideally, 
> we can let DAGScheduler submit bigger input size task first. It can reduce 
> the whole stage running time.
> [design 
> doc](https://docs.google.com/document/d/1vPcuEADUokO4XpqBV1rFH90Zi4rKdsgZtZMYX80c2gw/edit?usp=sharing)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to