Dongjoon Hyun created HIVE-17186:
------------------------------------

             Summary: `double` type constant operation loses precision
                 Key: HIVE-17186
                 URL: https://issues.apache.org/jira/browse/HIVE-17186
             Project: Hive
          Issue Type: Bug
            Reporter: Dongjoon Hyun


This might be an issue where Hive loses a precision and generates a wrong 
result when handling *double* constant operations. This was reported in the 
following environment.

*ENVIRONMENT*
https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql

*SQL*
{code}
hive> explain select l_discount from lineitem where l_discount between 0.06 - 
0.01 and 0.06 + 0.01 limit 10;
OK
Plan not optimized by CBO.

Stage-0
   Fetch Operator
      limit:10
      Stage-1
         Map 1 vectorized
         File Output Operator [FS_9]
            compressed:false
            Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column 
stats: COMPLETE
            table:{"input 
format:":"org.apache.hadoop.mapred.TextInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Limit [LIM_8]
               Number of rows:10
               Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
Column stats: COMPLETE
               Select Operator [OP_7]
                  outputColumnNames:["_col0"]
                  Statistics:Num rows: 2999994854 Data size: 23999958832 Basic 
stats: COMPLETE Column stats: COMPLETE
                  Filter Operator [FIL_6]
                     predicate:l_discount BETWEEN 0.049999999999999996 AND 
0.06999999999999999 (type: boolean)
                     Statistics:Num rows: 2999994854 Data size: 23999958832 
Basic stats: COMPLETE Column stats: COMPLETE
                     TableScan [TS_0]
                        alias:lineitem
                        Statistics:Num rows: 5999989709 Data size: 
4832986297043 Basic stats: COMPLETE Column stats: COMPLETE

hive> select max(l_discount) from lineitem where l_discount between 0.06 - 0.01 
and 0.06 + 0.01 limit 10;
OK
0.06
Time taken: 314.923 seconds, Fetched: 1 row(s)
{code}

Hive excludes 0.07 differently from the users' intuitiion. Also, this 
difference makes some users confused because they believe that Hive's result is 
the correct one. Is there any way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to