Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Dongjoon Hyun Thu, 10 Oct 2019 14:47:03 -0700

+1

Bests,
Dongjoon


On Thu, Oct 10, 2019 at 10:14 Ryan Blue <[email protected]> wrote:

> +1
>
> Thanks for fixing this!
>
> On Thu, Oct 10, 2019 at 6:30 AM Xiao Li <[email protected]> wrote:
>
>> +1
>>
>> On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon <[email protected]> wrote:
>>
>>> +1 (binding)
>>>
>>> 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro <[email protected]>님이
>>> 작성:
>>>
>>>> Thanks for the great work, Gengliang!
>>>>
>>>> +1 for that.
>>>> As I said before, the behaviour is pretty common in DBMSs, so the change
>>>> helps for DMBS users.
>>>>
>>>> Bests,
>>>> Takeshi
>>>>
>>>>
>>>> On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'd like to call for a new vote on SPARK-28885
>>>>> <https://issues.apache.org/jira/browse/SPARK-28885> "Follow ANSI
>>>>> store assignment rules in table insertion by default" after revising the
>>>>> ANSI store assignment policy(SPARK-29326
>>>>> <https://issues.apache.org/jira/browse/SPARK-29326>).
>>>>> When inserting a value into a column with the different data type,
>>>>> Spark performs type coercion. Currently, we support 3 policies for the
>>>>> store assignment rules: ANSI, legacy and strict, which can be set via the
>>>>> option "spark.sql.storeAssignmentPolicy":
>>>>> 1. ANSI: Spark performs the store assignment as per ANSI SQL. In
>>>>> practice, the behavior is mostly the same as PostgreSQL. It disallows
>>>>> certain unreasonable type conversions such as converting `string` to `int`
>>>>> and `double` to `boolean`. It will throw a runtime exception if the value
>>>>> is out-of-range(overflow).
>>>>> 2. Legacy: Spark allows the store assignment as long as it is a valid
>>>>> `Cast`, which is very loose. E.g., converting either `string` to `int` or
>>>>> `double` to `boolean` is allowed. It is the current behavior in Spark 2.x
>>>>> for compatibility with Hive. When inserting an out-of-range value to an
>>>>> integral field, the low-order bits of the value is inserted(the same as
>>>>> Java/Scala numeric type casting). For example, if 257 is inserted into a
>>>>> field of Byte type, the result is 1.
>>>>> 3. Strict: Spark doesn't allow any possible precision loss or data
>>>>> truncation in store assignment, e.g., converting either `double` to `int`
>>>>> or `decimal` to `double` is allowed. The rules are originally for Dataset
>>>>> encoder. As far as I know, no mainstream DBMS is using this policy by
>>>>> default.
>>>>>
>>>>> Currently, the V1 data source uses "Legacy" policy by default, while
>>>>> V2 uses "Strict". This proposal is to use "ANSI" policy by default for 
>>>>> both
>>>>> V1 and V2 in Spark 3.0.
>>>>>
>>>>> This vote is open until Friday (Oct. 11).
>>>>>
>>>>> [ ] +1: Accept the proposal
>>>>> [ ] +0
>>>>> [ ] -1: I don't think this is a good idea because ...
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Gengliang
>>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>> --
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

Reply via email to