GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22547

    [SPARK-25528][SQL] data source V2 read side API refactoring

    ## What changes were proposed in this pull request?
    
    Refactor the read side API according to the abstraction proposed in the 
[dev 
list](http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html)
    
    ```
    batch: catalog -> table -> scan
    streaming: catalog -> table -> stream -> scan
    ```
    
    More concretely, this PR
    1. add a new interface called `Format` that can return `Table`
    2. rename `ReadSupportProvider` to `Table`, represents a logical data set, 
with a schema.
    3. add a new interface `InputStream` to represent a streaming source in a 
streaming query. It can create `Scan`s.
    4. rename `ReadSupport` to `Scan`. Each `Scan` triggers one Spark job. 
(like an RDD)
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark new-idea

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22547
    
----
commit 92dfdaf990f2676d49766f5ab094e8b8a9a755b1
Author: Wenchen Fan <wenchen@...>
Date:   2018-08-27T15:20:08Z

    data source V2 read side API refactoring

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to