[ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058380#comment-17058380
 ] 

Jungtaek Lim commented on SPARK-31136:
--------------------------------------

https://github.com/apache/spark/blob/master/docs/sql-migration-guide.md

{quote}
Since Spark 3.0, CREATE TABLE without a specific provider will use the value of 
spark.sql.sources.default as its provider. In Spark version 2.4 and earlier, it 
was hive. To restore the behavior before Spark 3.0, you can set 
spark.sql.legacy.createHiveTableByDefault.enabled to true.
{quote}

It's not true if "ROW FORMAT" / "STORED AS" are provided, and we didn't 
describe anything for this.

https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-ddl-create-table-datasource.md

{quote}
CREATE TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1 col_type1 [ 
COMMENT col_comment1 ], ... ) ] [USING data_source] [ OPTIONS ( key1=val1, 
key2=val2, ... ) ] [ PARTITIONED BY ( col_name1, col_name2, ... ) ] [ CLUSTERED 
BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] 
INTO num_buckets BUCKETS ] [ LOCATION path ] [ COMMENT table_comment ] [ 
TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] [ AS select_statement ]
{quote}

https://github.com/apache/spark/blob/master/docs/sql-ref-syntax-ddl-create-table-hiveformat.md

{quote}
CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ ( col_name1[:] 
col_type1 [ COMMENT col_comment1 ], ... ) ] [ COMMENT table_comment ] [ 
PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... ) | ( 
col_name1, col_name2, ... ) ] [ ROW FORMAT row_format ] [ STORED AS file_format 
] [ LOCATION path ] [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ] [ AS 
select_statement ]
{quote}

At least we should describe that parser will try to match the first case 
(create table ~ using data source), and fail back to second case; even though 
we describe this it's not intuitive to reason about which rule the DDL query 
will fall into. As I commented earlier, "ROW FORMAT" and "STORED AS" are the 
options which make DDL query fall into the second case, but they're described 
as "optional" so it's hard to catch the gotcha.

Furthermore, while we document the syntax as above, in reality we allow 
"EXTERNAL" in first rule (and throw error), which ends up existing DDL query 
"CREATE EXTERNAL TABLE ~ LOCATION" be broken. It now requires "ROW FORMAT" or 
"STORED AS", even we add "USING hive".


> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31136
>                 URL: https://issues.apache.org/jira/browse/SPARK-31136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.061 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Time taken: 0.383 seconds
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> Time taken: 3.969 seconds
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to