+1 (non-binding) I have been following this standardization effort and I think it is sound and it provides the needed flexibility via the option.
Best regards, Alessandro On Mon, 7 Oct 2019 at 10:24, Gengliang Wang <gengliang.w...@databricks.com> wrote: > Hi everyone, > > I'd like to call for a new vote on SPARK-28885 > <https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI store > assignment rules in table insertion by default" after revising the ANSI > store assignment policy(SPARK-29326 > <https://issues.apache.org/jira/browse/SPARK-29326>). > When inserting a value into a column with the different data type, Spark > performs type coercion. Currently, we support 3 policies for the store > assignment rules: ANSI, legacy and strict, which can be set via the option > "spark.sql.storeAssignmentPolicy": > 1. ANSI: Spark performs the store assignment as per ANSI SQL. In practice, > the behavior is mostly the same as PostgreSQL. It disallows certain > unreasonable type conversions such as converting `string` to `int` and > `double` to `boolean`. It will throw a runtime exception if the value is > out-of-range(overflow). > 2. Legacy: Spark allows the store assignment as long as it is a valid > `Cast`, which is very loose. E.g., converting either `string` to `int` or > `double` to `boolean` is allowed. It is the current behavior in Spark 2.x > for compatibility with Hive. When inserting an out-of-range value to an > integral field, the low-order bits of the value is inserted(the same as > Java/Scala numeric type casting). For example, if 257 is inserted into a > field of Byte type, the result is 1. > 3. Strict: Spark doesn't allow any possible precision loss or data > truncation in store assignment, e.g., converting either `double` to `int` > or `decimal` to `double` is allowed. The rules are originally for Dataset > encoder. As far as I know, no mainstream DBMS is using this policy by > default. > > Currently, the V1 data source uses "Legacy" policy by default, while V2 > uses "Strict". This proposal is to use "ANSI" policy by default for both V1 > and V2 in Spark 3.0. > > This vote is open until Friday (Oct. 11). > > [ ] +1: Accept the proposal > [ ] +0 > [ ] -1: I don't think this is a good idea because ... > > Thank you! > > Gengliang >