[GitHub] spark issue #21718: [SPARK-24744][STRUCTRURED STREAMING] Set the SparkSessio...

HeartSaVioR Wed, 15 Aug 2018 00:27:02 -0700

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21718
  
    @bjkonglu @bethunebtj @wguangliang 
    
    Update: I thought about splitting execution tasks and data partitions 
(`spark.sql.shuffle.partitions`), and turned out it can be achieved by calling 
`coalesce`. With `coalesce` you can reduce execution tasks whereas the number 
of data partitions is kept same. Please note that we still can't change 
`spark.sql.shuffle.partitions`, since repartitioning state will not be trivial 
according to the size of the state.
    
    One thing to note is that execution tasks will be reduced even for 
downstream operators (unless there's a new stage), so you need to call 
`repartition` to adjust execution tasks for downstream.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21718: [SPARK-24744][STRUCTRURED STREAMING] Set the SparkSessio...

Reply via email to