Hi, Reynold. (And +Michael Armbrust) If you think so, do you think it's okay that we change the return value silently? Then, I'm wondering why we reverted `TRIM` functions then?
> Are we sure "not padding" is "incorrect"? Bests, Dongjoon. On Sun, Mar 15, 2020 at 11:15 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > > 100% agree with Reynold. > > > Regards, > Gourav Sengupta > > On Mon, Mar 16, 2020 at 3:31 AM Reynold Xin <r...@databricks.com> wrote: > >> Are we sure "not padding" is "incorrect"? >> >> I don't know whether ANSI SQL actually requires padding, but plenty of >> databases don't actually pad. >> >> https://docs.snowflake.net/manuals/sql-reference/data-types-text.html >> <https://docs.snowflake.net/manuals/sql-reference/data-types-text.html#:~:text=CHAR%20%2C%20CHARACTER,(1)%20is%20the%20default.&text=Snowflake%20currently%20deviates%20from%20common,space%2Dpadded%20at%20the%20end.> >> : >> "Snowflake currently deviates from common CHAR semantics in that strings >> shorter than the maximum length are not space-padded at the end." >> >> MySQL: >> https://stackoverflow.com/questions/53528645/why-char-dont-have-padding-in-mysql >> >> >> >> >> >> >> >> >> On Sun, Mar 15, 2020 at 7:02 PM, Dongjoon Hyun <dongjoon.h...@gmail.com> >> wrote: >> >>> Hi, Reynold. >>> >>> Please see the following for the context. >>> >>> https://issues.apache.org/jira/browse/SPARK-31136 >>> "Revert SPARK-30098 Use default datasource as provider for CREATE TABLE >>> syntax" >>> >>> I raised the above issue according to the new rubric, and the banning >>> was the proposed alternative to reduce the potential issue. >>> >>> Please give us your opinion since it's still PR. >>> >>> Bests, >>> Dongjoon. >>> >>> On Sat, Mar 14, 2020 at 17:54 Reynold Xin <r...@databricks.com> wrote: >>> >>>> I don’t understand this change. Wouldn’t this “ban” confuse the hell >>>> out of both new and old users? >>>> >>>> For old users, their old code that was working for char(3) would now >>>> stop working. >>>> >>>> For new users, depending on whether the underlying metastore char(3) is >>>> either supported but different from ansi Sql (which is not that big of a >>>> deal if we explain it) or not supported. >>>> >>>> On Sat, Mar 14, 2020 at 3:51 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>> wrote: >>>> >>>>> Hi, All. >>>>> >>>>> Apache Spark has been suffered from a known consistency issue on >>>>> `CHAR` type behavior among its usages and configurations. However, the >>>>> evolution direction has been gradually moving forward to be consistent >>>>> inside Apache Spark because we don't have `CHAR` offically. The following >>>>> is the summary. >>>>> >>>>> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different >>>>> result. >>>>> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to >>>>> Hive behavior.) >>>>> >>>>> spark-sql> CREATE TABLE t1(a CHAR(3)); >>>>> spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC; >>>>> spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET; >>>>> >>>>> spark-sql> INSERT INTO TABLE t1 SELECT 'a '; >>>>> spark-sql> INSERT INTO TABLE t2 SELECT 'a '; >>>>> spark-sql> INSERT INTO TABLE t3 SELECT 'a '; >>>>> >>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>> a 3 >>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>> a 3 >>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>> a 2 >>>>> >>>>> Since 2.4.0, `STORED AS ORC` became consistent. >>>>> (`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to >>>>> Hive behavior.) >>>>> >>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>> a 3 >>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>> a 2 >>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>> a 2 >>>>> >>>>> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) >>>>> became consistent. >>>>> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a >>>>> fallback to Hive behavior.) >>>>> >>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>> a 2 >>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>> a 2 >>>>> spark-sql> SELECT a, length(a) FROM t3; >>>>> a 2 >>>>> >>>>> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in >>>>> the following syntax to be safe. >>>>> >>>>> CREATE TABLE t(a CHAR(3)); >>>>> https://github.com/apache/spark/pull/27902 >>>>> >>>>> This email is sent out to inform you based on the new policy we voted. >>>>> The recommendation is always using Apache Spark's native type `String`. >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> References: >>>>> 1. "CHAR implementation?", 2017/09/15 >>>>> >>>>> https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E >>>>> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE >>>>> TABLE syntax", 2019/12/06 >>>>> >>>>> https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E >>>>> >>>> >>