Hi everyone, I'd like to call for a new vote on SPARK-28885 <https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI store assignment rules in table insertion by default" after revising the ANSI store assignment policy(SPARK-29326 <https://issues.apache.org/jira/browse/SPARK-29326>). When inserting a value into a column with the different data type, Spark performs type coercion. Currently, we support 3 policies for the store assignment rules: ANSI, legacy and strict, which can be set via the option "spark.sql.storeAssignmentPolicy": 1. ANSI: Spark performs the store assignment as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean`. It will throw a runtime exception if the value is out-of-range(overflow). 2. Legacy: Spark allows the store assignment as long as it is a valid `Cast`, which is very loose. E.g., converting either `string` to `int` or `double` to `boolean` is allowed. It is the current behavior in Spark 2.x for compatibility with Hive. When inserting an out-of-range value to an integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted into a field of Byte type, the result is 1. 3. Strict: Spark doesn't allow any possible precision loss or data truncation in store assignment, e.g., converting either `double` to `int` or `decimal` to `double` is allowed. The rules are originally for Dataset encoder. As far as I know, no mainstream DBMS is using this policy by default.
Currently, the V1 data source uses "Legacy" policy by default, while V2 uses "Strict". This proposal is to use "ANSI" policy by default for both V1 and V2 in Spark 3.0. This vote is open until Friday (Oct. 11). [ ] +1: Accept the proposal [ ] +0 [ ] -1: I don't think this is a good idea because ... Thank you! Gengliang