[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

jaceklaskowski Mon, 27 Aug 2018 11:07:37 -0700

Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22238#discussion_r213063267
  
    --- Diff: docs/structured-streaming-programming-guide.md ---
    @@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
     
     # Additional Information
     
    +**Gotchas**
    +
    +- For structured streaming, modifying "spark.sql.shuffle.partitions" is 
restricted once you run the query.
    +  - This is because state is partitioned via key, hence number of 
partitions for state should be unchanged.
    +  - If you want to run less tasks for stateful operations, `coalesce` 
would help with avoiding unnecessary repartitioning. Please note that it will 
also affect downstream operators.
    --- End diff --
    
    An example of how to use `coalesce` operator with stateful streaming query 
would be superb.
    
    I'd also appreciate if you added what type of downstream operators are 
affected and how.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

Reply via email to