[jira] [Commented] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

Wenchen Fan (Jira) Thu, 12 Mar 2020 22:06:00 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058427#comment-17058427
 ]


Wenchen Fan commented on SPARK-31136:
-------------------------------------

The problem is that, we don't really support the char type, we added it only 
for hive tables. In fact, if you look at Spark's official document, 
https://spark.apache.org/docs/latest/sql-reference.html#data-types , there is 
no char type.

On the other hand, this is a long-standing issue that data source tables simply 
treat char type as string type. This behavior is not so bad as char type is 
kind of a hidden feature and is intended to only serve for hive tables. But 
this does introduce a silent result changing if we create data source tables by 
default.

However, I think it's still worth to keep SPARK-30098 to bring Spark perf 
benefits to more users. My proposal: fail in the parser if users use char type 
to create a data source table. This never works correctly and seems OK to 
forbid it.


> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-31136
>                 URL: https://issues.apache.org/jira/browse/SPARK-31136
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Blocker
>              Labels: correctness
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a     3
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a     2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

Reply via email to