+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly
enhance data quality and integrity. I fully support this initiative.

> In other words, the current Spark ANSI SQL implementation becomes the
first implementation for Spark SQL users to face at first while providing
`spark.sql.ansi.enabled=false` in the same way without losing any
capability.`spark.sql.ansi.enabled=false` in the same way without losing
any capability.

BTW, the try_*
<https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#useful-functions-for-ansi-mode>
functions and SQL Error Attribution Framework
<https://issues.apache.org/jira/browse/SPARK-38615> will also be beneficial
in migrating to ANSI SQL mode.


Gengliang


On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, All.
>
> Thanks to you, we've been achieving many things and have on-going SPIPs.
> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly
> by asking your opinions about Apache Spark's ANSI SQL mode.
>
>     https://issues.apache.org/jira/browse/SPARK-44111
>     Prepare Apache Spark 4.0.0
>
> SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable
> items for 4.0.0 because it's a big behavior.
>
>     https://issues.apache.org/jira/browse/SPARK-44444
>     Use ANSI SQL mode by default
>
> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and
> has
> been aiming to provide a better Spark SQL compatibility in a standard way.
> We also have a daily CI to protect the behavior too.
>
>     https://github.com/apache/spark/actions/workflows/build_ansi.yml
>
> However, it's still behind the configuration with several known issues,
> e.g.,
>
>     SPARK-41794 Reenable ANSI mode in test_connect_column
>     SPARK-41547 Reenable ANSI mode in test_connect_functions
>     SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard
>
> To be clear, we know that many DBMSes have their own implementations of
> SQL standard and not the same. Like them, SPARK-44444 aims to enable
> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
> There is nothing more than that.
>
> In other words, the current Spark ANSI SQL implementation becomes the first
> implementation for Spark SQL users to face at first while providing
> `spark.sql.ansi.enabled=false` in the same way without losing any
> capability.
>
> If we don't want this change for some reasons, we can simply exclude
> SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation.
> It's time just to make a go/no-go decision for this item for the global
> optimization
> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
> for this again for the next four years until 2028.
>
> WDYT?
>
> Bests,
> Dongjoon
>

Reply via email to