[ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058333#comment-17058333
 ] 

Jungtaek Lim edited comment on SPARK-31136 at 3/13/20, 1:34 AM:
----------------------------------------------------------------

This reminds me about my previous PR:

[https://github.com/apache/spark/pull/27107]

Please go through the comments in the PR again. I'm quoting the key point here:
{quote}The parts differentiating between two syntaxes are skewSpec, rowFormat, 
and createFileFormat (using any of them would make create statement go into 2nd 
syntax), and all of them are optional. We're not enforcing to specify it but 
rely on the parser.
{quote}
I think the parser implementation around CREATE TABLE brings ambiguity which is 
not documented anywhere. It wasn't ambiguous because we forced to specify 
STORED AS if it's not a Hive table. Now it's either default provider or Hive 
according to which options are provided, which seems to be non-trivial to 
reason about. (End users would never know, as it's completely from parser rule.)

I feel this as the issue of "not breaking old behavior". The parser rule gets 
pretty much complicated due to support legacy config. Not breaking anything 
would make us be stuck eventually.


was (Author: kabhwan):
This reminds me about my previous PR:

[https://github.com/apache/spark/pull/27107]

Please go through the comments in the PR again. I'm quoting the key point here:
{quote}The parts differentiating between two syntaxes are skewSpec, rowFormat, 
and createFileFormat (using any of them would make create statement go into 2nd 
syntax), and all of them are optional. We're not enforcing to specify it but 
rely on the parser.
{quote}
I think the parser implementation around CREATE TABLE brings ambiguity which is 
not documented anywhere. It wasn't ambiguous because we forced to specify 
STORED AS if it's not a Hive table. Now it's either default provider or Hive 
according to which options are provided, which seems to be non-trivial to 
reason about.

I feel this as the issue of "not breaking old behavior". The parser rule gets 
pretty much complicated due to support legacy config. Not breaking anything 
would make us be stuck eventually.

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31136
>                 URL: https://issues.apache.org/jira/browse/SPARK-31136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.061 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Time taken: 0.383 seconds
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.969 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to