[ 
https://issues.apache.org/jira/browse/SPARK-54654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarno Rajala updated SPARK-54654:
---------------------------------
    Description: 
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
disallowing any potentially unsafe implicit type conversions. Right now it 
doesn't.

As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
requires serious consideration and should be treated as a bug.

Consider the behaviour of following queries when 
*_spark.sql.ansi.enabled=true:_*

{{SELECT 123='123';}}
{{SELECT 123='123X';}}

The first will succeed and returns _{*}true{*}._ The second will fail with a 
hard error. The same issue exists with other operations, such as when using 
{*}_coalesce()_{*}. 

In complex setups, where data may come from various different sources and 
passed through multiple tables, it can be hard to ensure strict typing 
everywhere and numbers are likely to be passed in string typed columns 
unintentionally. Typing errors may go uncaught for a considerable time until a 
catastrophic runtime type mismatch occurs with the introduction of an invalid 
non-numeric string at some point.

  was:
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
disallowing any potentially unsafe implicit type conversions. Right now it 
doesn't.

As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
requires serious consideration and should be treated as a bug.

Consider the behaviour of following queries when 
*_spark.sql.ansi.enabled=true:_*

{{SELECT 123='123';}}
{{SELECT 123='123X';}}

The first will succeed and returns _{*}true{*}._ The second will fail with a 
hard error. The same issue exists with other operations, such as when using 
{*}_coalesce()_{*}. 

In complex setups, where data may come from various different sources and 
passed through multiple tables, it can be hard to ensure strict typing 
everywhere and numbers are likely to be passed in string typed columns 
unintentionally. Typing errors may go uncaught for a considerable time until a 
catastrophic runtime type mismatch occurs. 


> Dangerous automatic type coercion when spark.sql.ansi.enabled=true
> ------------------------------------------------------------------
>
>                 Key: SPARK-54654
>                 URL: https://issues.apache.org/jira/browse/SPARK-54654
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.1, 3.5.7
>            Reporter: Jarno Rajala
>            Priority: Major
>
> The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing, 
> disallowing any potentially unsafe implicit type conversions. Right now it 
> doesn't.
> As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this 
> requires serious consideration and should be treated as a bug.
> Consider the behaviour of following queries when 
> *_spark.sql.ansi.enabled=true:_*
> {{SELECT 123='123';}}
> {{SELECT 123='123X';}}
> The first will succeed and returns _{*}true{*}._ The second will fail with a 
> hard error. The same issue exists with other operations, such as when using 
> {*}_coalesce()_{*}. 
> In complex setups, where data may come from various different sources and 
> passed through multiple tables, it can be hard to ensure strict typing 
> everywhere and numbers are likely to be passed in string typed columns 
> unintentionally. Typing errors may go uncaught for a considerable time until 
> a catastrophic runtime type mismatch occurs with the introduction of an 
> invalid non-numeric string at some point.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to