[GitHub] spark pull request #22281: [SPARK-25280][SQL] Add support for SQL syntax for...

HyukjinKwon Thu, 30 Aug 2018 01:50:14 -0700

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22281


    [SPARK-25280][SQL] Add support for SQL syntax for DataSourceV2

    ## What changes were proposed in this pull request?
    
    This PR targets for DataSource V2 to have `USING` syntax support mainly. 
    
    Currently, 
    
    ```scala
    spark.sql(s"CREATE TABLE tableB USING 
${classOf[SimpleDataSourceV2].getCanonicalName}")
    ```
    
    produces an error:
    
    ```
    org.apache.spark.sql.sources.v2.SimpleDataSourceV2 is not a valid Spark SQL 
Data Source.;
    org.apache.spark.sql.AnalysisException: 
org.apache.spark.sql.sources.v2.SimpleDataSourceV2 is not a valid Spark SQL 
Data Source.;
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385)
        at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
    ```
    
    So that developers (Datasource V1) can easily migrate and their users can 
smoothly change their codes using `USING` syntax from Datasource v1, we better 
support this case as well.
    
    There's a discussion thread about this here as well - 
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-USING-syntax-for-Datasource-V2-td24754.html
    
    Some discussions are going on:
    
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-USING-syntax-for-Datasource-V2-td24754.html
    
    For this one, looks we can proceed orthogonally with multiple catalog 
support.
    
    The approach taken here is basically introduce `DataSourceRelation` trait 
which connects DataSourceV1 and DataSourceV2 so that the changes can be 
minimised. For extendability, this uses a pattern match so that, for instance, 
newer DataSource can be added in the (far) future.
    
    For `StreamingDataSourceV2Relation` and `BatchWriteSupportProvider`, it is 
not handled here.
    
    ## How was this patch tested?
    
    Unit tests were added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark using-syntax-dsv2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22281
    
----
commit 35f1748cb414990e03f0fd5802d26151a371ee13
Author: hyukjinkwon <gurwls223@...>
Date:   2018-08-28T07:01:16Z

    Add support for SQL syntax for DataSourceV2

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22281: [SPARK-25280][SQL] Add support for SQL syntax for...

Reply via email to