Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

Dongjoon Hyun Sat, 13 Apr 2024 15:09:09 -0700

Thank you for your opinions, Gangling, Liang-Chi, Wenchen, Huaxin, Serge, 
Nicholas.


To Nicholas, Apache Spark community already decided not to pursuit PostgreSQL 
dialect.

>  I’m flagging this since Spark’s behavior differs in these cases from 
> Postgres,
> as described in the ticket.

Please see the following thread (November 26, 2019).

https://lists.apache.org/thread/v1fx1wkxh5sp6odjcyohppr5x67cyrov
[DISCUSS] PostgreSQL dialect

Given the AS-IS consensus, I'll proceed to start a vote for this topic.

Thanks,
Dongjoon.

On 2024/04/12 17:31:49 Nicholas Chammas wrote:
> This is a side issue, but I’d like to bring people’s attention to 
> SPARK-28024. 
> 
> Cases 2, 3, and 4 described in that ticket are still problems today on master 
> (I just rechecked) even with ANSI mode enabled.
> 
> Well, maybe not problems, but I’m flagging this since Spark’s behavior 
> differs in these cases from Postgres, as described in the ticket.
> 
> 
> > On Apr 12, 2024, at 12:09 AM, Gengliang Wang <[email protected]> wrote:
> > 
> > 
> > +1, enabling Spark's ANSI SQL mode in version 4.0 will significantly 
> > enhance data quality and integrity. I fully support this initiative.
> > 
> > > In other words, the current Spark ANSI SQL implementation becomes the 
> > > first implementation for Spark SQL users to face at first while providing
> > `spark.sql.ansi.enabled=false` in the same way without losing any 
> > capability.`spark.sql.ansi.enabled=false` in the same way without losing 
> > any capability.
> > 
> > BTW, the try_* 
> > <https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#useful-functions-for-ansi-mode>
> >  functions and SQL Error Attribution Framework 
> > <https://issues.apache.org/jira/browse/SPARK-38615> will also be beneficial 
> > in migrating to ANSI SQL mode.
> > 
> > 
> > Gengliang
> > 
> > 
> > On Thu, Apr 11, 2024 at 7:56 PM Dongjoon Hyun <[email protected] 
> > <mailto:[email protected]>> wrote:
> >> Hi, All.
> >> 
> >> Thanks to you, we've been achieving many things and have on-going SPIPs.
> >> I believe it's time to scope Apache Spark 4.0.0 (SPARK-44111) more narrowly
> >> by asking your opinions about Apache Spark's ANSI SQL mode.
> >> 
> >>     https://issues.apache.org/jira/browse/SPARK-44111
> >>     Prepare Apache Spark 4.0.0
> >> 
> >> SPARK-44444 was proposed last year (on 15/Jul/23) as the one of desirable
> >> items for 4.0.0 because it's a big behavior.
> >> 
> >>     https://issues.apache.org/jira/browse/SPARK-44444
> >>     Use ANSI SQL mode by default
> >> 
> >> Historically, spark.sql.ansi.enabled was added at Apache Spark 3.0.0 and 
> >> has
> >> been aiming to provide a better Spark SQL compatibility in a standard way.
> >> We also have a daily CI to protect the behavior too.
> >> 
> >>     https://github.com/apache/spark/actions/workflows/build_ansi.yml
> >> 
> >> However, it's still behind the configuration with several known issues, 
> >> e.g.,
> >> 
> >>     SPARK-41794 Reenable ANSI mode in test_connect_column
> >>     SPARK-41547 Reenable ANSI mode in test_connect_functions
> >>     SPARK-46374 Array Indexing is 1-based via ANSI SQL Standard
> >> 
> >> To be clear, we know that many DBMSes have their own implementations of
> >> SQL standard and not the same. Like them, SPARK-44444 aims to enable
> >> only the existing Spark's configuration, `spark.sql.ansi.enabled=true`.
> >> There is nothing more than that.
> >> 
> >> In other words, the current Spark ANSI SQL implementation becomes the first
> >> implementation for Spark SQL users to face at first while providing
> >> `spark.sql.ansi.enabled=false` in the same way without losing any 
> >> capability.
> >> 
> >> If we don't want this change for some reasons, we can simply exclude
> >> SPARK-44444 from SPARK-44111 as a part of Apache Spark 4.0.0 preparation.
> >> It's time just to make a go/no-go decision for this item for the global 
> >> optimization
> >> for Apache Spark 4.0.0 release. After 4.0.0, it's unlikely for us to aim
> >> for this again for the next four years until 2028.
> >> 
> >> WDYT?
> >> 
> >> Bests,
> >> Dongjoon
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

Reply via email to