I don’t understand this change. Wouldn’t this “ban” confuse the hell out of
both new and old users?

For old users, their old code that was working for char(3) would now stop
working.

For new users, depending on whether the underlying metastore char(3) is
either supported but different from ansi Sql (which is not that big of a
deal if we explain it) or not supported.

On Sat, Mar 14, 2020 at 3:51 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, All.
>
> Apache Spark has been suffered from a known consistency issue on `CHAR`
> type behavior among its usages and configurations. However, the evolution
> direction has been gradually moving forward to be consistent inside Apache
> Spark because we don't have `CHAR` offically. The following is the summary.
>
> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different result.
> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to
> Hive behavior.)
>
>     spark-sql> CREATE TABLE t1(a CHAR(3));
>     spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC;
>     spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET;
>
>     spark-sql> INSERT INTO TABLE t1 SELECT 'a ';
>     spark-sql> INSERT INTO TABLE t2 SELECT 'a ';
>     spark-sql> INSERT INTO TABLE t3 SELECT 'a ';
>
>     spark-sql> SELECT a, length(a) FROM t1;
>     a   3
>     spark-sql> SELECT a, length(a) FROM t2;
>     a   3
>     spark-sql> SELECT a, length(a) FROM t3;
>     a 2
>
> Since 2.4.0, `STORED AS ORC` became consistent.
> (`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to Hive
> behavior.)
>
>     spark-sql> SELECT a, length(a) FROM t1;
>     a   3
>     spark-sql> SELECT a, length(a) FROM t2;
>     a 2
>     spark-sql> SELECT a, length(a) FROM t3;
>     a 2
>
> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) became
> consistent.
> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a
> fallback to Hive behavior.)
>
>     spark-sql> SELECT a, length(a) FROM t1;
>     a 2
>     spark-sql> SELECT a, length(a) FROM t2;
>     a 2
>     spark-sql> SELECT a, length(a) FROM t3;
>     a 2
>
> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in the
> following syntax to be safe.
>
>     CREATE TABLE t(a CHAR(3));
>     https://github.com/apache/spark/pull/27902
>
> This email is sent out to inform you based on the new policy we voted.
> The recommendation is always using Apache Spark's native type `String`.
>
> Bests,
> Dongjoon.
>
> References:
> 1. "CHAR implementation?", 2017/09/15
>
> https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E
> 2. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE
> syntax", 2019/12/06
>
> https://lists.apache.org/thread.html/493f88c10169680191791f9f6962fd16cd0ffa3b06726e92ed04cbe1%40%3Cdev.spark.apache.org%3E
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to