Hi, All.

Apache Spark has been suffered from a known consistency issue on `CHAR`
type behavior among its usages and configurations. However, the evolution
direction has been gradually moving forward to be consistent inside Apache
Spark because we don't have `CHAR` offically. The following is the summary.

With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different result.
(`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to Hive

    spark-sql> CREATE TABLE t1(a CHAR(3));
    spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC;
    spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET;

    spark-sql> INSERT INTO TABLE t1 SELECT 'a ';
    spark-sql> INSERT INTO TABLE t2 SELECT 'a ';
    spark-sql> INSERT INTO TABLE t3 SELECT 'a ';

    spark-sql> SELECT a, length(a) FROM t1;
    a   3
    spark-sql> SELECT a, length(a) FROM t2;
    a   3
    spark-sql> SELECT a, length(a) FROM t3;
    a 2

Since 2.4.0, `STORED AS ORC` became consistent.
(`spark.sql.hive.convertMetastoreOrc=false` provides a fallback to Hive

    spark-sql> SELECT a, length(a) FROM t1;
    a   3
    spark-sql> SELECT a, length(a) FROM t2;
    a 2
    spark-sql> SELECT a, length(a) FROM t3;
    a 2

Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) became
(`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a
fallback to Hive behavior.)

    spark-sql> SELECT a, length(a) FROM t1;
    a 2
    spark-sql> SELECT a, length(a) FROM t2;
    a 2
    spark-sql> SELECT a, length(a) FROM t3;
    a 2

In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in the
following syntax to be safe.

    CREATE TABLE t(a CHAR(3));

This email is sent out to inform you based on the new policy we voted.
The recommendation is always using Apache Spark's native type `String`.


1. "CHAR implementation?", 2017/09/15

2. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE
syntax", 2019/12/06


Reply via email to