[ https://issues.apache.org/jira/browse/SPARK-31841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118114#comment-17118114 ]
Jungtaek Lim commented on SPARK-31841: -------------------------------------- It sounds like a question/feature request which is better to be moved to user@ or dev@ mailing list. > Dataset.repartition leverage adaptive execution > ----------------------------------------------- > > Key: SPARK-31841 > URL: https://issues.apache.org/jira/browse/SPARK-31841 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Environment: spark branch-3.0 from may 1 this year > Reporter: koert kuipers > Priority: Minor > > hello, > we are very happy users of adaptive query execution. its a great feature to > now have to think about and tune the number of partitions anymore in a > shuffle. > i noticed that Dataset.groupBy consistently uses adaptive execution when its > enabled (e.g. i don't see the default 200 partitions) but when i do > Dataset.repartition it seems i am back to a hardcoded number of partitions. > Should adaptive execution also be used for repartition? It would be nice to > be able to repartition without having to think about optimal number of > partitions. > An example: > {code:java} > $ spark-shell --conf spark.sql.adaptive.enabled=true --conf > spark.sql.adaptive.advisoryPartitionSizeInBytes=100000 > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_252) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val x = (1 to 1000000).toDF > x: org.apache.spark.sql.DataFrame = [value: int] > scala> x.rdd.getNumPartitions > res0: Int = 2scala> x.repartition($"value").rdd.getNumPartitions > res1: Int = 200 > scala> x.groupBy("value").count.rdd.getNumPartitions > res2: Int = 67 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org