Review Request: float and double calculation is inaccurate in Hive

Johnny Zhang Mon, 17 Dec 2012 16:10:08 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8653/
-----------------------------------------------------------


Review request for hive.


Description
-------

I found this during debug the e2e test failures. I found Hive miss calculate 
the float and double value. Take float calculation as an example:
hive> select f from all100k limit 1;
48308.98
hive> select f/10 from all100k limit 1;
4830.898046875 <--added 04875 in the end
hive> select f*1.01 from all100k limit 1;
48792.0702734375 <--should be 48792.0698
It might be essentially the same problem as 
http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm. 
But since e2e test compare the results with mysql and seems mysql does it 
right, so it is worthy fixing it in Hive.


This addresses bug HIVE-3715.
    https://issues.apache.org/jira/browse/HIVE-3715


Diffs
-----

  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
 1423224 
  
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
 1423224 

Diff: https://reviews.apache.org/r/8653/diff/


Testing
-------

I did test to compare the result with mysql default float precision setting, 
the result is identical.

query:          select f, f*1.01, f/10 from all100k limit 1;
mysql result:   48309       48792.0702734375    4830.898046875
hive result:    48308.98    48792.0702734375    4830.898046875


I apply this patch and run the hive e2e test, and the tests all pass (without 
this patch, 5 related failures)


Thanks,

Johnny Zhang

Review Request: float and double calculation is inaccurate in Hive

Reply via email to