+1, enabling Spark's ANSI SQL mode in version 4.0 will significantly enhance data quality and integrity. I fully support this initiative.
> In other words, the current Spark ANSI SQL implementation becomes the first implementation for Spark SQL users to face at first while providing `spark.sql.ansi.enabled=false` in the same way without losing any capability.`spark.sql.ansi.enabled=false` in the same way without losing any capability. BTW, the try_* <https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#useful-functions-for-ansi-mode> functions and SQL Error Attribution Framework <https://issues.apache.org/jira/browse/SPARK-38615> will also be beneficial in migrating to ANSI SQL mode. Gengliang On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, All. > > Thanks to you, we've been achieving many things and have on-going SPIPs. > I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly > by asking your opinions about Apache Spark's ANSI SQL mode. > > https://issues.apache.org/jira/browse/SPARK-44111 > Prepare Apache Spark 4.0.0 > > SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable > items for 4.0.0 because it's a big behavior. > > https://issues.apache.org/jira/browse/SPARK-44444 > Use ANSI SQL mode by default > > Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and > has > been aiming to provide a better Spark SQL compatibility in a standard way. > We also have a daily CI to protect the behavior too. > > https://github.com/apache/spark/actions/workflows/build_ansi.yml > > However, it's still behind the configuration with several known issues, > e.g., > > SPARK-41794 Reenable ANSI mode in test_connect_column > SPARK-41547 Reenable ANSI mode in test_connect_functions > SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard > > To be clear, we know that many DBMSes have their own implementations of > SQL standard and not the same. Like them, SPARK-44444 aims to enable > only the existing Spark's configuration, `spark.sql.ansi.enabled=true`. > There is nothing more than that. > > In other words, the current Spark ANSI SQL implementation becomes the first > implementation for Spark SQL users to face at first while providing > `spark.sql.ansi.enabled=false` in the same way without losing any > capability. > > If we don't want this change for some reasons, we can simply exclude > SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation. > It's time just to make a go/no-go decision for this item for the global > optimization > for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim > for this again for the next four years until 2028. > > WDYT? > > Bests, > Dongjoon >