[jira] [Updated] (SPARK-54654) Dangerous automatic type coercion when spark.sql.ansi.enabled=true

Jarno Rajala (Jira) Tue, 09 Dec 2025 10:58:34 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jarno Rajala updated SPARK-54654:
---------------------------------
    Description: 
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
disallowing any potentially unsafe implicit type conversions. Right now it 
doesn't.

As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
requires serious consideration and should be treated as a bug.

Consider the behaviour of following queries when 
*_spark.sql.ansi.enabled=true:_*

{{SELECT 123='123';}}
{{SELECT 123='123X';}}

The first will succeed and returns _{*}true{*}._ The second will fail with a 
hard error. The same issue exists with other operations, such as when using 
{*}_coalesce()_{*}. 

In complex setups, where data may come from various different sources and 
passed through multiple tables, it can be hard to ensure strict typing 
everywhere and numbers are likely to be passed in string typed columns 
unintentionally. Typing errors may go uncaught for a considerable time until a 
catastrophic runtime type mismatch occurs with the introduction of an invalid 
non-numeric string at some point.
It can also be hard to debug such issues as queries with a LIMIT clause may 
succeed without errors, but the full query doesn't.

  was:
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
disallowing any potentially unsafe implicit type conversions. Right now it 
doesn't.

As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
requires serious consideration and should be treated as a bug.

Consider the behaviour of following queries when 
*_spark.sql.ansi.enabled=true:_*

{{SELECT 123='123';}}
{{SELECT 123='123X';}}

The first will succeed and returns _{*}true{*}._ The second will fail with a 
hard error. The same issue exists with other operations, such as when using 
{*}_coalesce()_{*}. 

In complex setups, where data may come from various different sources and 
passed through multiple tables, it can be hard to ensure strict typing 
everywhere and numbers are likely to be passed in string typed columns 
unintentionally. Typing errors may go uncaught for a considerable time until a 
catastrophic runtime type mismatch occurs with the introduction of an invalid 
non-numeric string at some point.


> Dangerous automatic type coercion when spark.sql.ansi.enabled=true
> ------------------------------------------------------------------
>
>                 Key: SPARK-54654
>                 URL: https://issues.apache.org/jira/browse/SPARK-54654
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.1, 3.5.7
>            Reporter: Jarno Rajala
>            Priority: Major
>
> The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
> disallowing any potentially unsafe implicit type conversions. Right now it 
> doesn't.
> As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
> requires serious consideration and should be treated as a bug.
> Consider the behaviour of following queries when 
> *_spark.sql.ansi.enabled=true:_*
> {{SELECT 123='123';}}
> {{SELECT 123='123X';}}
> The first will succeed and returns _{*}true{*}._ The second will fail with a 
> hard error. The same issue exists with other operations, such as when using 
> {*}_coalesce()_{*}. 
> In complex setups, where data may come from various different sources and 
> passed through multiple tables, it can be hard to ensure strict typing 
> everywhere and numbers are likely to be passed in string typed columns 
> unintentionally. Typing errors may go uncaught for a considerable time until 
> a catastrophic runtime type mismatch occurs with the introduction of an 
> invalid non-numeric string at some point.
> It can also be hard to debug such issues as queries with a LIMIT clause may 
> succeed without errors, but the full query doesn't.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54654) Dangerous automatic type coercion when spark.sql.ansi.enabled=true

Reply via email to