GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/21239
[SPARK-24040][SS] Support single partition aggregates in continuous processing. ## What changes were proposed in this pull request? Support aggregates with exactly 1 partition in continuous processing. A few small tweaks are needed to make this work: * Replace currentEpoch tracking with an ThreadLocal. This means that current epoch is scoped to a task rather than a node, but I think that's sustainable even once we add shuffle. * Add a new testing-only flag to disable the UnsupportedOperationChecker whitelist of allowed continuous processing nodes. I think this is preferable to writing a pile of custom logic to enforce that there is in fact only 1 partition; we plan to support multi-partition aggregates before the next Spark release, so we'd just have to tear that logic back out. * Restart continuous processing queries from the first available uncommitted epoch, rather than one that's guaranteed to be unused. This is required for stateful operators to overwrite partial state from the previous attempt at the epoch, and there was no specific motivation for the original strategy. In another PR before stabilizing the StreamWriter API, we'll need to narrow down and document more precise semantic guarantees for the epoch IDs. ## How was this patch tested? new unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/jose-torres/spark withAggr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21239.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21239 ---- commit c620978f98cda9b178afb08b87041d6154d5edd0 Author: Jose Torres <torres.joseph.f+github@...> Date: 2018-05-04T21:34:22Z rebase on master commit 4dbc10db1acae62e415c12da2d21dd2428692a7d Author: Jose Torres <torres.joseph.f+github@...> Date: 2018-05-04T21:38:26Z suite got left out of commit commit 9b4aecd01951eb9c31671535e2a081a484a39d58 Author: Jose Torres <torres.joseph.f+github@...> Date: 2018-05-04T22:15:48Z move to EpochTracker ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org