[ https://issues.apache.org/jira/browse/SPARK-32112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Noritaka Sekiyama updated SPARK-32112: -------------------------------------- Summary: Easier way to repartition/coalesce DataFrames based on the number of parallel tasks that Spark can process at the same time (was: Add a method to calculate the number of parallel tasks that Spark can process at the same time) > Easier way to repartition/coalesce DataFrames based on the number of parallel > tasks that Spark can process at the same time > --------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-32112 > URL: https://issues.apache.org/jira/browse/SPARK-32112 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Noritaka Sekiyama > Priority: Major > > Repartition/coalesce is very important to optimize Spark application's > performance, however, a lot of users are struggling with determining the > number of partitions. > This issue is to add a method to calculate the number of parallel tasks that > Spark can process at the same time. > It will help Spark users to determine the optimal number of partitions. > Expected use-cases: > - repartition with the calculated parallel tasks -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org