GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22281
[SPARK-25280][SQL] Add support for SQL syntax for DataSourceV2
## What changes were proposed in this pull request?
This PR targets for DataSource V2 to have `USING` syntax support mainly.
Currently,
```scala
spark.sql(s"CREATE TABLE tableB USING
${classOf[SimpleDataSourceV2].getCanonicalName}")
```
produces an error:
```
org.apache.spark.sql.sources.v2.SimpleDataSourceV2 is not a valid Spark SQL
Data Source.;
org.apache.spark.sql.AnalysisException:
org.apache.spark.sql.sources.v2.SimpleDataSourceV2 is not a valid Spark SQL
Data Source.;
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
```
So that developers (Datasource V1) can easily migrate and their users can
smoothly change their codes using `USING` syntax from Datasource v1, we better
support this case as well.
There's a discussion thread about this here as well -
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-USING-syntax-for-Datasource-V2-td24754.html
Some discussions are going on:
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-USING-syntax-for-Datasource-V2-td24754.html
For this one, looks we can proceed orthogonally with multiple catalog
support.
The approach taken here is basically introduce `DataSourceRelation` trait
which connects DataSourceV1 and DataSourceV2 so that the changes can be
minimised. For extendability, this uses a pattern match so that, for instance,
newer DataSource can be added in the (far) future.
For `StreamingDataSourceV2Relation` and `BatchWriteSupportProvider`, it is
not handled here.
## How was this patch tested?
Unit tests were added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark using-syntax-dsv2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22281.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22281
----
commit 35f1748cb414990e03f0fd5802d26151a371ee13
Author: hyukjinkwon <gurwls223@...>
Date: 2018-08-28T07:01:16Z
Add support for SQL syntax for DataSourceV2
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]