Re: Review Request: float and double calculation is inaccurate in Hive

Bharath Mundlapudi Tue, 18 Dec 2012 00:51:56 -0800

We have solved this issue recently. It is not just a problem in Hive. Contact 
me offline if you need more details.


-Bharath



________________________________
 From: Johnny Zhang <xiao...@cloudera.com>
To: Johnny Zhang <xiao...@cloudera.com>; Mark Grover 
<grover.markgro...@gmail.com>; hive <dev@hive.apache.org> 
Sent: Monday, December 17, 2012 5:13 PM
Subject: Re: Review Request: float and double calculation is inaccurate in Hive
 


> On Dec. 18, 2012, 12:38 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java,
> >  line 50
> > <https://reviews.apache.org/r/8653/diff/1/?file=240423#file240423line50>
> >
> >     10 seems to be a rather arbitrary number for scale. Any particular 
> >reason you are using it? Maybe we should invoke the method where no scale 
> >needs to be specified.
> 
> Johnny Zhang wrote:
>     Hi, Mark, thanks for reviewing it. The reason using 10 is because it is 
>the same as mysql default precision setting. Just want to make the calculation 
>result identical to mysql's

I think I did tried without specify scale, and the result is different from 
mysql. I agree hard coding the scale is not a good way. Open to other 
suggestions.


- Johnny


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8653/#review14625
-----------------------------------------------------------


On Dec. 18, 2012, 12:37 a.m., Johnny Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8653/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2012, 12:37 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> I found this during debug the e2e test failures. I found Hive miss calculate 
> the float and double value. Take float calculation as an example:
> hive> select f from all100k limit 1;
> 48308.98
> hive> select f/10 from all100k limit 1;
> 4830.898046875 <--added 04875 in the end
> hive> select f*1.01 from all100k limit 1;
> 48792.0702734375 <--should be 48792.0698
> It might be essentially the same problem as 
> http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm 
> But since e2e test compare the results with mysql and seems mysql does it 
> right, so it is worthy fixing it in Hive.
> 
> 
> This addresses bug HIVE-3715.
>    https://issues.apache.org/jira/browse/HIVE-3715
> 
> 
> Diffs
> -----
> 
>  
>http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
> 1423224 
>   
>http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
> 1423224 
> 
> Diff: https://reviews.apache.org/r/8653/diff/
> 
> 
> Testing
> -------
> 
> I did test to compare the result with mysql default float precision setting, 
> the result is identical.
> 
> query:          select f, f*1.01, f/10 from all100k limit 1;
> mysql result:   48309       48792.0702734375    4830.898046875
> hive result:    48308.98    48792.0702734375    4830.898046875
> 
> 
> I apply this patch and run the hive e2e test, and the tests all pass (without 
> this patch, 5 related failures)
> 
> 
> Thanks,
> 
> Johnny Zhang
> 
>

Re: Review Request: float and double calculation is inaccurate in Hive

Reply via email to