Hi, All.

Thanks to you, we've been achieving many things and have on-going SPIPs.
I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly
by asking your opinions about Apache Spark's ANSI SQL mode.

    https://issues.apache.org/jira/browse/SPARK-44111
    Prepare Apache Spark 4.0.0

SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable
items for 4.0.0 because it's a big behavior.

    https://issues.apache.org/jira/browse/SPARK-44444
    Use ANSI SQL mode by default

Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and has
been aiming to provide a better Spark SQL compatibility in a standard way.
We also have a daily CI to protect the behavior too.

    https://github.com/apache/spark/actions/workflows/build_ansi.yml

However, it's still behind the configuration with several known issues,
e.g.,

    SPARK-41794 Reenable ANSI mode in test_connect_column
    SPARK-41547 Reenable ANSI mode in test_connect_functions
    SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard

To be clear, we know that many DBMSes have their own implementations of
SQL standard and not the same. Like them, SPARK-44444 aims to enable
only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
There is nothing more than that.

In other words, the current Spark ANSI SQL implementation becomes the first
implementation for Spark SQL users to face at first while providing
`spark.sql.ansi.enabled=false` in the same way without losing any
capability.

If we don't want this change for some reasons, we can simply exclude
SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation.
It's time just to make a go/no-go decision for this item for the global
optimization
for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
for this again for the next four years until 2028.

WDYT?

Bests,
Dongjoon

Reply via email to