[ 
https://issues.apache.org/jira/browse/SPARK-28885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28885.
-----------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 26107
[https://github.com/apache/spark/pull/26107]

> Follow ANSI store assignment rules in table insertion by default
> ----------------------------------------------------------------
>
>                 Key: SPARK-28885
>                 URL: https://issues.apache.org/jira/browse/SPARK-28885
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> When inserting a value into a column with the different data type, Spark 
> performs type coercion. Currently, we support 3 policies for the store 
> assignment rules: ANSI, legacy and strict, which can be set via the option 
> "spark.sql.storeAssignmentPolicy":
> 1. ANSI: Spark performs the type coercion as per ANSI SQL. In practice, the 
> behavior is mostly the same as PostgreSQL. It disallows certain unreasonable 
> type conversions such as converting `string` to `int` and `double` to 
> `boolean`. It will throw a runtime exception if the value is 
> out-of-range(overflow). 
> 2. Legacy: Spark allows the type coercion as long as it is a valid `Cast`, 
> which is very loose. E.g., converting either `string` to `int` or `double` to 
> `boolean` is allowed. It is the current behavior in Spark 2.x for 
> compatibility with Hive. When inserting an out-of-range value to a integral 
> field, the low-order bits of the value is inserted(the same as Java/Scala 
> numeric type casting). For example, if 257 is inserted to a field of Byte 
> type, the result is 1.
> 3. Strict: Spark doesn't allow any possible precision loss or data truncation 
> in store assignment, e.g., converting either `double` to `int` or `decimal` 
> to `double` is allowed. The rules are originally for Dataset encoder. As far 
> as I know, no mainstream DBMS is using this policy by default.
> Currently, the V1 data source uses "Legacy" policy by default, while V2 uses 
> "Strict". This proposal is to use "ANSI" policy by default for both V1 and V2 
> in Spark 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to