+1 I believe ANSI mode is well developed after many releases. No doubt it could be used. Since it is very easy to disable it to restore to current behavior, I guess the impact could be limited. Do we have known the possible impacts such as what are the major changes (e.g., what kind of queries/expressions will fail)? We can describe them in the release note.
On Thu, Apr 11, 2024 at 10:29 PM Gengliang Wang <ltn...@gmail.com> wrote: > > > +1, enabling Spark's ANSI SQL mode in version 4.0 will significantly enhance > data quality and integrity. I fully support this initiative. > > > In other words, the current Spark ANSI SQL implementation becomes the first > > implementation for Spark SQL users to face at first while providing > `spark.sql.ansi.enabled=false` in the same way without losing any > capability.`spark.sql.ansi.enabled=false` in the same way without losing any > capability. > > BTW, the try_* functions and SQL Error Attribution Framework will also be > beneficial in migrating to ANSI SQL mode. > > > Gengliang > > > On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >> Hi, All. >> >> Thanks to you, we've been achieving many things and have on-going SPIPs. >> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly >> by asking your opinions about Apache Spark's ANSI SQL mode. >> >> https://issues.apache.org/jira/browse/SPARK-44111 >> Prepare Apache Spark 4.0.0 >> >> SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable >> items for 4.0.0 because it's a big behavior. >> >> https://issues.apache.org/jira/browse/SPARK-44444 >> Use ANSI SQL mode by default >> >> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and has >> been aiming to provide a better Spark SQL compatibility in a standard way. >> We also have a daily CI to protect the behavior too. >> >> https://github.com/apache/spark/actions/workflows/build_ansi.yml >> >> However, it's still behind the configuration with several known issues, e.g., >> >> SPARK-41794 Reenable ANSI mode in test_connect_column >> SPARK-41547 Reenable ANSI mode in test_connect_functions >> SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard >> >> To be clear, we know that many DBMSes have their own implementations of >> SQL standard and not the same. Like them, SPARK-44444 aims to enable >> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`. >> There is nothing more than that. >> >> In other words, the current Spark ANSI SQL implementation becomes the first >> implementation for Spark SQL users to face at first while providing >> `spark.sql.ansi.enabled=false` in the same way without losing any capability. >> >> If we don't want this change for some reasons, we can simply exclude >> SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation. >> It's time just to make a go/no-go decision for this item for the global >> optimization >> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim >> for this again for the next four years until 2028. >> >> WDYT? >> >> Bests, >> Dongjoon --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org