[
https://issues.apache.org/jira/browse/SPARK-54654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarno Rajala updated SPARK-54654:
---------------------------------
Description:
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing,
disallowing any potentially unsafe implicit type conversions. Right now it
doesn't.
As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this
requires serious consideration and should be treated as a bug.
Consider the behaviour of following queries when
*_spark.sql.ansi.enabled=true:_*
{{SELECT 123='123';}}
{{SELECT 123='123X';}}
The first will succeed and returns _{*}true{*}._ The second will fail with a
hard error. The same issue exists with other operations, such as when using
{*}_coalesce()_{*}.
In complex setups, where data may come from various different sources and
passed through multiple tables, it can be hard to ensure strict typing
everywhere and numbers are likely to be passed in string typed columns
unintentionally. Typing errors may go uncaught for a considerable time until a
catastrophic runtime type mismatch occurs with the introduction of an invalid
non-numeric string at some point.
It can also be hard to debug such issues as queries with a LIMIT clause may
succeed without errors, but the full query doesn't.
was:
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing,
disallowing any potentially unsafe implicit type conversions. Right now it
doesn't.
As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this
requires serious consideration and should be treated as a bug.
Consider the behaviour of following queries when
*_spark.sql.ansi.enabled=true:_*
{{SELECT 123='123';}}
{{SELECT 123='123X';}}
The first will succeed and returns _{*}true{*}._ The second will fail with a
hard error. The same issue exists with other operations, such as when using
{*}_coalesce()_{*}.
In complex setups, where data may come from various different sources and
passed through multiple tables, it can be hard to ensure strict typing
everywhere and numbers are likely to be passed in string typed columns
unintentionally. Typing errors may go uncaught for a considerable time until a
catastrophic runtime type mismatch occurs with the introduction of an invalid
non-numeric string at some point.
> Dangerous automatic type coercion when spark.sql.ansi.enabled=true
> ------------------------------------------------------------------
>
> Key: SPARK-54654
> URL: https://issues.apache.org/jira/browse/SPARK-54654
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.1, 3.5.7
> Reporter: Jarno Rajala
> Priority: Major
>
> The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing,
> disallowing any potentially unsafe implicit type conversions. Right now it
> doesn't.
> As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this
> requires serious consideration and should be treated as a bug.
> Consider the behaviour of following queries when
> *_spark.sql.ansi.enabled=true:_*
> {{SELECT 123='123';}}
> {{SELECT 123='123X';}}
> The first will succeed and returns _{*}true{*}._ The second will fail with a
> hard error. The same issue exists with other operations, such as when using
> {*}_coalesce()_{*}.
> In complex setups, where data may come from various different sources and
> passed through multiple tables, it can be hard to ensure strict typing
> everywhere and numbers are likely to be passed in string typed columns
> unintentionally. Typing errors may go uncaught for a considerable time until
> a catastrophic runtime type mismatch occurs with the introduction of an
> invalid non-numeric string at some point.
> It can also be hard to debug such issues as queries with a LIMIT clause may
> succeed without errors, but the full query doesn't.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]