Are we sure "not padding" is "incorrect"? I don't know whether ANSI SQL actually requires padding, but plenty of databases don't actually pad.
https://docs.snowflake.net/manuals/sql-reference/data-types-text.html ( https://docs.snowflake.net/manuals/sql-reference/data-types-text.html#:~:text=CHAR%20%2C%20CHARACTER,(1)%20is%20the%20default.&text=Snowflake%20currently%20deviates%20from%20common,space%2Dpadded%20at%20the%20end. ) : "Snowflake currently deviates from common CHAR semantics in that strings shorter than the maximum length are not space-padded at the end." MySQL: https://stackoverflow.com/questions/53528645/why-char-dont-have-padding-in-mysql On Sun, Mar 15, 2020 at 7:02 PM, Dongjoon Hyun < dongjoon.h...@gmail.com > wrote: > > Hi, Reynold. > > > Please see the following for the context. > > > https:/ / issues. apache. org/ jira/ browse/ SPARK-31136 ( > https://issues.apache.org/jira/browse/SPARK-31136 ) > "Revert SPARK-30098 Use default datasource as provider for CREATE TABLE > syntax" > > > I raised the above issue according to the new rubric, and the banning was > the proposed alternative to reduce the potential issue. > > > Please give us your opinion since it's still PR. > > > Bests, > Dongjoon. > > On Sat, Mar 14, 2020 at 17:54 Reynold Xin < rxin@ databricks. com ( > r...@databricks.com ) > wrote: > > >> I don’t understand this change. Wouldn’t this “ban” confuse the hell out >> of both new and old users? >> >> >> For old users, their old code that was working for char(3) would now stop >> working. >> >> >> For new users, depending on whether the underlying metastore char(3) is >> either supported but different from ansi Sql (which is not that big of a >> deal if we explain it) or not supported. >> >> On Sat, Mar 14, 2020 at 3:51 PM Dongjoon Hyun < dongjoon. hyun@ gmail. com >> ( dongjoon.h...@gmail.com ) > wrote: >> >> >>> Hi, All. >>> >>> Apache Spark has been suffered from a known consistency issue on `CHAR` >>> type behavior among its usages and configurations. However, the evolution >>> direction has been gradually moving forward to be consistent inside Apache >>> Spark because we don't have `CHAR` offically. The following is the >>> summary. >>> >>> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different result. >>> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to >>> Hive behavior.) >>> >>> spark-sql> CREATE TABLE t1(a CHAR(3)); >>> spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; >>> spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; >>> >>> spark-sql> INSERT INTO TABLE t1 SELECT 'a '; >>> spark-sql> INSERT INTO TABLE t2 SELECT 'a '; >>> spark-sql> INSERT INTO TABLE t3 SELECT 'a '; >>> >>> spark-sql> SELECT a, length(a) FROM t1; >>> a 3 >>> spark-sql> SELECT a, length(a) FROM t2; >>> a 3 >>> spark-sql> SELECT a, length(a) FROM t3; >>> a 2 >>> >>> Since 2.4.0, `STORED AS ORC` became consistent. >>> (`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to Hive >>> behavior.) >>> >>> spark-sql> SELECT a, length(a) FROM t1; >>> a 3 >>> spark-sql> SELECT a, length(a) FROM t2; >>> a 2 >>> spark-sql> SELECT a, length(a) FROM t3; >>> a 2 >>> >>> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) became >>> consistent. >>> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a >>> fallback to Hive behavior.) >>> >>> spark-sql> SELECT a, length(a) FROM t1; >>> a 2 >>> spark-sql> SELECT a, length(a) FROM t2; >>> a 2 >>> spark-sql> SELECT a, length(a) FROM t3; >>> a 2 >>> >>> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in the >>> following syntax to be safe. >>> >>> CREATE TABLE t(a CHAR(3)); >>> https:/ / github. com/ apache/ spark/ pull/ 27902 ( >>> https://github.com/apache/spark/pull/27902 ) >>> >>> This email is sent out to inform you based on the new policy we voted. >>> The recommendation is always using Apache Spark's native type `String`. >>> >>> Bests, >>> Dongjoon. >>> >>> References: >>> 1. "CHAR implementation?", 2017/09/15 >>> https:/ / lists. apache. org/ thread. html/ >>> 96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev. >>> spark. apache. org%3E ( >>> https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E >>> ) >>> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE >>> syntax", 2019/12/06 >>> https:/ / lists. apache. org/ thread. html/ >>> 493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev. >>> spark. apache. org%3E ( >>> https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E >>> ) >>> >> >> > >
smime.p7s
Description: S/MIME Cryptographic Signature