GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22547
[SPARK-25528][SQL] data source V2 read side API refactoring ## What changes were proposed in this pull request? Refactor the read side API according to the abstraction proposed in the [dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html) ``` batch: catalog -> table -> scan streaming: catalog -> table -> stream -> scan ``` More concretely, this PR 1. add a new interface called `Format` that can return `Table` 2. rename `ReadSupportProvider` to `Table`, represents a logical data set, with a schema. 3. add a new interface `InputStream` to represent a streaming source in a streaming query. It can create `Scan`s. 4. rename `ReadSupport` to `Scan`. Each `Scan` triggers one Spark job. (like an RDD) ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark new-idea Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22547 ---- commit 92dfdaf990f2676d49766f5ab094e8b8a9a755b1 Author: Wenchen Fan <wenchen@...> Date: 2018-08-27T15:20:08Z data source V2 read side API refactoring ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org