[ https://issues.apache.org/jira/browse/SPARK-37528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiDuo You updated SPARK-37528: ------------------------------ Description: In general, the larger input data size means longer running time. So ideally, we can let DAGScheduler submit bigger input size task first. It can reduce the whole stage running time. [design doc](https://docs.google.com/document/d/1vPcuEADUokO4XpqBV1rFH90Zi4rKdsgZtZMYX80c2gw/edit?usp=sharing) was: In general, the larger input data size means longer running time. So ideally, we can let DAGScheduler submit bigger input size task first. It can reduce the whole stage running time. For example, we have one stage with 4 tasks and the defaultParallelism is 2 and the 4 tasks have different running time [1s, 3s, 2s, 4s]. - in normal, the running time of the stage is: 7s - if big task first, the running time of the stage is: 5s > Schedule Tasks By Input Size > ---------------------------- > > Key: SPARK-37528 > URL: https://issues.apache.org/jira/browse/SPARK-37528 > Project: Spark > Issue Type: New Feature > Components: Spark Core, SQL > Affects Versions: 3.4.0 > Reporter: XiDuo You > Priority: Major > > In general, the larger input data size means longer running time. So ideally, > we can let DAGScheduler submit bigger input size task first. It can reduce > the whole stage running time. > [design > doc](https://docs.google.com/document/d/1vPcuEADUokO4XpqBV1rFH90Zi4rKdsgZtZMYX80c2gw/edit?usp=sharing) -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org