GitHub user jose-torres opened a pull request:

    https://github.com/apache/spark/pull/21239

    [SPARK-24040][SS] Support single partition aggregates in continuous 
processing.

    ## What changes were proposed in this pull request?
    
    Support aggregates with exactly 1 partition in continuous processing.
    
    A few small tweaks are needed to make this work:
    
    * Replace currentEpoch tracking with an ThreadLocal. This means that 
current epoch is scoped to a task rather than a node, but I think that's 
sustainable even once we add shuffle.
    * Add a new testing-only flag to disable the UnsupportedOperationChecker 
whitelist of allowed continuous processing nodes. I think this is preferable to 
writing a pile of custom logic to enforce that there is in fact only 1 
partition; we plan to support multi-partition aggregates before the next Spark 
release, so we'd just have to tear that logic back out.
    * Restart continuous processing queries from the first available 
uncommitted epoch, rather than one that's guaranteed to be unused. This is 
required for stateful operators to overwrite partial state from the previous 
attempt at the epoch, and there was no specific motivation for the original 
strategy. In another PR before stabilizing the StreamWriter API, we'll need to 
narrow down and document more precise semantic guarantees for the epoch IDs.
    
    ## How was this patch tested?
    
    new unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jose-torres/spark withAggr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21239
    
----
commit c620978f98cda9b178afb08b87041d6154d5edd0
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-05-04T21:34:22Z

    rebase on master

commit 4dbc10db1acae62e415c12da2d21dd2428692a7d
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-05-04T21:38:26Z

    suite got left out of commit

commit 9b4aecd01951eb9c31671535e2a081a484a39d58
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-05-04T22:15:48Z

    move to EpochTracker

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to