Hi, All. Thanks to you, we've been achieving many things and have on-going SPIPs. I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly by asking your opinions about Apache Spark's ANSI SQL mode.
https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark 4.0.0 SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable items for 4.0.0 because it's a big behavior. https://issues.apache.org/jira/browse/SPARK-44444 Use ANSI SQL mode by default Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and has been aiming to provide a better Spark SQL compatibility in a standard way. We also have a daily CI to protect the behavior too. https://github.com/apache/spark/actions/workflows/build_ansi.yml However, it's still behind the configuration with several known issues, e.g., SPARK-41794 Reenable ANSI mode in test_connect_column SPARK-41547 Reenable ANSI mode in test_connect_functions SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard To be clear, we know that many DBMSes have their own implementations of SQL standard and not the same. Like them, SPARK-44444 aims to enable only the existing Spark's configuration, `spark.sql.ansi.enabled=true`. There is nothing more than that. In other words, the current Spark ANSI SQL implementation becomes the first implementation for Spark SQL users to face at first while providing `spark.sql.ansi.enabled=false` in the same way without losing any capability. If we don't want this change for some reasons, we can simply exclude SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation. It's time just to make a go/no-go decision for this item for the global optimization for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim for this again for the next four years until 2028. WDYT? Bests, Dongjoon