[ https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058427#comment-17058427 ]
Wenchen Fan commented on SPARK-31136: ------------------------------------- The problem is that, we don't really support the char type, we added it only for hive tables. In fact, if you look at Spark's official document, https://spark.apache.org/docs/latest/sql-reference.html#data-types , there is no char type. On the other hand, this is a long-standing issue that data source tables simply treat char type as string type. This behavior is not so bad as char type is kind of a hidden feature and is intended to only serve for hive tables. But this does introduce a silent result changing if we create data source tables by default. However, I think it's still worth to keep SPARK-30098 to bring Spark perf benefits to more users. My proposal: fail in the parser if users use char type to create a data source table. This never works correctly and seems OK to forbid it. > Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > ----------------------------------------------------------------------------- > > Key: SPARK-31136 > URL: https://issues.apache.org/jira/browse/SPARK-31136 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.0.0 > Reporter: Dongjoon Hyun > Priority: Blocker > Labels: correctness > > We need to consider the behavior change of SPARK-30098 . > This is a placeholder to keep the discussion and the final decision. > `CREATE TABLE` syntax changes its behavior silently. > The following is one example of the breaking the existing user data pipelines. > *Apache Spark 2.4.5* > {code} > spark-sql> CREATE TABLE t(a STRING); > spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t; > spark-sql> SELECT * FROM t LIMIT 1; > # Apache Spark > Time taken: 2.05 seconds, Fetched 1 row(s) > {code} > {code} > spark-sql> CREATE TABLE t(a CHAR(3)); > spark-sql> INSERT INTO TABLE t SELECT 'a '; > spark-sql> SELECT a, length(a) FROM t; > a 3 > {code} > *Apache Spark 3.0.0-preview2* > {code} > spark-sql> CREATE TABLE t(a STRING); > spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t; > Error in query: LOAD DATA is not supported for datasource tables: > `default`.`t`; > {code} > {code} > spark-sql> CREATE TABLE t(a CHAR(3)); > spark-sql> INSERT INTO TABLE t SELECT 'a '; > spark-sql> SELECT a, length(a) FROM t; > a 2 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org