GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/12119
[SPARK-14288][SQL] Memory Sink for streaming This PR exposes the internal testing `MemorySink` though the data source API. This will allow users to easily test streaming applications in the Spark shell or other local tests. Usage: ```scala inputStream.write .format("memory") .queryName("memStream") .startStream() // Now you can query the result of the stream here. sqlContext.table("memStream") ``` The most complicated part of the logic is setting checkpoint directory. There are a few requirements we are attempting to satisfy here: - when working in the shell locally, it should just work with no extra configuration. - when working on a cluster you should be able to make it easily create the checkpoint on a distributed file system so you can test aggregation (state checkpoints are also stored in this directory and must be accessible from workers). - it should be clear that you can't resume since the data is just in memory. The chosen algorithm proceeds as follows: - the user gives a checkpoint directory, use it - if the conf has a checkpoint location, use `$location/$queryName` - if neither, create a local directory You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark memorySink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12119 ---- commit aaee000cd7bb5ad30710847c5bf48d96cdd870f5 Author: Michael Armbrust <mich...@databricks.com> Date: 2016-03-31T06:33:02Z [SPARK-14288][SQL] Memory Sink for streaming ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org