Jarno Rajala created SPARK-54654:
------------------------------------
Summary: Dangerous automatic type coercion when
spark.sql.ansi.enabled=true
Key: SPARK-54654
URL: https://issues.apache.org/jira/browse/SPARK-54654
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.7, 4.0.1
Reporter: Jarno Rajala
The setting _*spark.sql.ansi.enabled=true*_ should enforce strict typing,
disallowing any potentially unsafe implicit type conversions. Right now it
doesn't.
As _*spark.sql.ansi.enabled=true*_ by default since Spark 4.0, I think this
requires serious consideration and should be treated as a bug.
Consider the behaviour of following queries when
*_spark.sql.ansi.enabled=true:_*
{{SELECT 123='123';}}
{{SELECT 123='123X';}}
The first will succeed and returns _{*}true{*}._ The second will fail with a
hard error. The same issue exists with other operations, such as when using
{*}_coalesce()_{*}.
In complex setups, where data may come from various different sources and
passed through multiple tables, it can be hard to ensure strict typing
everywhere and numbers are likely to be passed in string typed columns
unintentionally. Typing errors may go uncaught for a considerable time until a
catastrophic runtime type mismatch occurs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]