Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21718 @bjkonglu @bethunebtj @wguangliang Update: I thought about splitting execution tasks and data partitions (`spark.sql.shuffle.partitions`), and turned out it can be achieved by calling `coalesce`. With `coalesce` you can reduce execution tasks whereas the number of data partitions is kept same. Please note that we still can't change `spark.sql.shuffle.partitions`, since repartitioning state will not be trivial according to the size of the state. One thing to note is that execution tasks will be reduced even for downstream operators (unless there's a new stage), so you need to call `repartition` to adjust execution tasks for downstream.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org