Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Ryan Blue Thu, 10 Oct 2019 10:15:04 -0700

+1

Thanks for fixing this!


On Thu, Oct 10, 2019 at 6:30 AM Xiao Li <lix...@databricks.com> wrote:

> +1
>
> On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>
>> +1 (binding)
>>
>> 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro <linguin....@gmail.com>님이 작성:
>>
>>> Thanks for the great work, Gengliang!
>>>
>>> +1 for that.
>>> As I said before, the behaviour is pretty common in DBMSs, so the change
>>> helps for DMBS users.
>>>
>>> Bests,
>>> Takeshi
>>>
>>>
>>> On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang <
>>> gengliang.w...@databricks.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I'd like to call for a new vote on SPARK-28885
>>>> <https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI store
>>>> assignment rules in table insertion by default" after revising the ANSI
>>>> store assignment policy(SPARK-29326
>>>> <https://issues.apache.org/jira/browse/SPARK-29326>).
>>>> When inserting a value into a column with the different data type,
>>>> Spark performs type coercion. Currently, we support 3 policies for the
>>>> store assignment rules: ANSI, legacy and strict, which can be set via the
>>>> option "spark.sql.storeAssignmentPolicy":
>>>> 1. ANSI: Spark performs the store assignment as per ANSI SQL. In
>>>> practice, the behavior is mostly the same as PostgreSQL. It disallows
>>>> certain unreasonable type conversions such as converting `string` to `int`
>>>> and `double` to `boolean`. It will throw a runtime exception if the value
>>>> is out-of-range(overflow).
>>>> 2. Legacy: Spark allows the store assignment as long as it is a valid
>>>> `Cast`, which is very loose. E.g., converting either `string` to `int` or
>>>> `double` to `boolean` is allowed. It is the current behavior in Spark 2.x
>>>> for compatibility with Hive. When inserting an out-of-range value to an
>>>> integral field, the low-order bits of the value is inserted(the same as
>>>> Java/Scala numeric type casting). For example, if 257 is inserted into a
>>>> field of Byte type, the result is 1.
>>>> 3. Strict: Spark doesn't allow any possible precision loss or data
>>>> truncation in store assignment, e.g., converting either `double` to `int`
>>>> or `decimal` to `double` is allowed. The rules are originally for Dataset
>>>> encoder. As far as I know, no mainstream DBMS is using this policy by
>>>> default.
>>>>
>>>> Currently, the V1 data source uses "Legacy" policy by default, while V2
>>>> uses "Strict". This proposal is to use "ANSI" policy by default for both V1
>>>> and V2 in Spark 3.0.
>>>>
>>>> This vote is open until Friday (Oct. 11).
>>>>
>>>> [ ] +1: Accept the proposal
>>>> [ ] +0
>>>> [ ] -1: I don't think this is a good idea because ...
>>>>
>>>> Thank you!
>>>>
>>>> Gengliang
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Reply via email to