Bharath, I am interesting in hearing more as well. Could you please comment on https://issues.apache.org/jira/browse/HIVE-2693 ?
Thanks in advance! On Tue, Dec 18, 2012 at 12:51 AM, Bharath Mundlapudi <bharathw...@yahoo.com> wrote: > We have solved this issue recently. It is not just a problem in Hive. > Contact me offline if you need more details. > > -Bharath > > ________________________________ > From: Johnny Zhang <xiao...@cloudera.com> > To: Johnny Zhang <xiao...@cloudera.com>; Mark Grover > <grover.markgro...@gmail.com>; hive <dev@hive.apache.org> > Sent: Monday, December 17, 2012 5:13 PM > Subject: Re: Review Request: float and double calculation is inaccurate in > Hive > > > >> On Dec. 18, 2012, 12:38 a.m., Mark Grover wrote: >> > >> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java, >> > line 50 >> > <https://reviews.apache.org/r/8653/diff/1/?file=240423#file240423line50> > >> > >> > 10 seems to be a rather arbitrary number for scale. Any particular >> > reason you are using it? Maybe we should invoke the method where no scale >> > needs to be specified. >> >> Johnny Zhang wrote: >> Hi, Mark, thanks for reviewing it. The reason using 10 is because it is >> the same as mysql default precision setting. Just want to make the >> calculation result identical to mysql's > > I think I did tried without specify scale, and the result is different from > mysql. I agree hard coding the scale is not a good way. Open to other > suggestions. > > > - Johnny > > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8653/#review14625 > ----------------------------------------------------------- > > > On Dec. 18, 2012, 12:37 a.m., Johnny Zhang wrote: >> >> ----------------------------------------------------------- > >> This is an automatically generated e-mail. To reply, visit: >> https://reviews.apache.org/r/8653/ >> ----------------------------------------------------------- >> >> (Updated Dec. 18, 2012, 12:37 a.m.) > >> >> >> Review request for hive. >> >> >> Description >> ------- >> >> I found this during debug the e2e test failures. I found Hive miss >> calculate the float and double value. Take float calculation as an example: >> hive> select f from all100k limit 1; >> 48308.98 >> hive> select f/10 from all100k limit 1; >> 4830.898046875 <--added 04875 in the end >> hive> select f*1.01 from all100k limit 1; >> 48792.0702734375 <--should be 48792.0698 >> It might be essentially the same problem as >> http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm >> But since e2e test compare the results with mysql and seems mysql does it >> right, so it is worthy fixing it in Hive. >> >> >> This addresses bug HIVE-3715. >> https://issues.apache.org/jira/browse/HIVE-3715 >> >> >> Diffs >> ----- >> >> >> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java >> 1423224 >> >> http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java >> 1423224 >> >> Diff: https://reviews.apache.org/r/8653/diff/ >> >> >> Testing >> ------- >> >> I did test to compare the result with mysql default float precision >> setting, the result is identical. >> >> query: select f, f*1.01, f/10 from all100k limit 1; >> mysql result: 48309 48792.0702734375 4830.898046875 >> hive result: 48308.98 48792.0702734375 4830.898046875 >> >> >> I apply this patch and run the hive e2e test, and the tests all pass >> (without this patch, 5 related failures) >> >> >> Thanks, >> >> Johnny Zhang >> >> > > >