Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21718
  
    @bjkonglu @bethunebtj @wguangliang 
    
    Update: I thought about splitting execution tasks and data partitions 
(`spark.sql.shuffle.partitions`), and turned out it can be achieved by calling 
`coalesce`. With `coalesce` you can reduce execution tasks whereas the number 
of data partitions is kept same. Please note that we still can't change 
`spark.sql.shuffle.partitions`, since repartitioning state will not be trivial 
according to the size of the state.
    
    One thing to note is that execution tasks will be reduced even for 
downstream operators (unless there's a new stage), so you need to call 
`repartition` to adjust execution tasks for downstream.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to