Re: Review Request: float and double calculation is inaccurate in Hive

Mark Grover Fri, 21 Dec 2012 13:44:17 -0800

Bharath,
I am interesting in hearing more as well. Could you please comment on
https://issues.apache.org/jira/browse/HIVE-2693
?


Thanks in advance!

On Tue, Dec 18, 2012 at 12:51 AM, Bharath Mundlapudi
<[email protected]> wrote:
> We have solved this issue recently. It is not just a problem in Hive.
> Contact me offline if you need more details.
>
> -Bharath
>
> ________________________________
> From: Johnny Zhang <[email protected]>
> To: Johnny Zhang <[email protected]>; Mark Grover
> <[email protected]>; hive <[email protected]>
> Sent: Monday, December 17, 2012 5:13 PM
> Subject: Re: Review Request: float and double calculation is inaccurate in
> Hive
>
>
>
>> On Dec. 18, 2012, 12:38 a.m., Mark Grover wrote:
>> >
>> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java,
>> > line 50
>> > <https://reviews.apache.org/r/8653/diff/1/?file=240423#file240423line50>
>
>> >
>> >    10 seems to be a rather arbitrary number for scale. Any particular
>> > reason you are using it? Maybe we should invoke the method where no scale
>> > needs to be specified.
>>
>> Johnny Zhang wrote:
>>    Hi, Mark, thanks for reviewing it. The reason using 10 is because it is
>> the same as mysql default precision setting. Just want to make the
>> calculation result identical to mysql's
>
> I think I did tried without specify scale, and the result is different from
> mysql. I agree hard coding the scale is not a good way. Open to other
> suggestions.
>
>
> - Johnny
>
>
> -----------------------------------------------------------
>
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8653/#review14625
> -----------------------------------------------------------
>
>
> On Dec. 18, 2012, 12:37 a.m., Johnny Zhang wrote:
>>
>> -----------------------------------------------------------
>
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/8653/
>> -----------------------------------------------------------
>>
>> (Updated Dec. 18, 2012, 12:37 a.m.)
>
>>
>>
>> Review request for hive.
>>
>>
>> Description
>> -------
>>
>> I found this during debug the e2e test failures. I found Hive miss
>> calculate the float and double value. Take float calculation as an example:
>> hive> select f from all100k limit 1;
>> 48308.98
>> hive> select f/10 from all100k limit 1;
>> 4830.898046875 <--added 04875 in the end
>> hive> select f*1.01 from all100k limit 1;
>> 48792.0702734375 <--should be 48792.0698
>> It might be essentially the same problem as
>> http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm
>> But since e2e test compare the results with mysql and seems mysql does it
>> right, so it is worthy fixing it in Hive.
>>
>>
>> This addresses bug HIVE-3715.
>>    https://issues.apache.org/jira/browse/HIVE-3715
>>
>>
>> Diffs
>> -----
>>
>>
>> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
>> 1423224
>>
>> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
>> 1423224
>>
>> Diff: https://reviews.apache.org/r/8653/diff/
>>
>>
>> Testing
>> -------
>>
>> I did test to compare the result with mysql default float precision
>> setting, the result is identical.
>>
>> query:          select f, f*1.01, f/10 from all100k limit 1;
>> mysql result:  48309      48792.0702734375    4830.898046875
>> hive result:    48308.98    48792.0702734375    4830.898046875
>>
>>
>> I apply this patch and run the hive e2e test, and the tests all pass
>> (without this patch, 5 related failures)
>>
>>
>> Thanks,
>>
>> Johnny Zhang
>>
>>
>
>
>

Re: Review Request: float and double calculation is inaccurate in Hive

Reply via email to