Xin Hao created HIVE-13292: ------------------------------ Summary: Different DOUBLE type precision issue between Spark and MR engine Key: HIVE-13292 URL: https://issues.apache.org/jira/browse/HIVE-13292 Project: Hive Issue Type: Bug Environment: Apache Hive 2.0.0 Apache Spark 1.6.0 Reporter: Xin Hao
Different DOUBLE type precision issue between Spark and MR engine. Found when executing the TPC-H query5 with scale factor 2 (2GB data size). More details are as below. (1)The MR engine output: MOZAMBIQUE,1.0646195910990009E8 ETHIOPIA,1.0108856206629996E8 ALGERIA,9.987582690420012E7 MOROCCO,9.785484184850013E7 KENYA,9.412388077690017E7 (2)The Spark engine output: MOZAMBIQUE,1.064619591099E8 ETHIOPIA,1.0108856206630005E8 ALGERIA,9.987582690419997E7 MOROCCO,9.785484184850003E7 KENYA,9.412388077690002E7 (3)Detail SQL used: drop table if exists ${env:RESULT_TABLE}; create table ${env:RESULT_TABLE} ( pid1 STRING, pid2 DOUBLE ) row format delimited fields terminated by ',' lines terminated by '\n' stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}'; insert into table ${env:RESULT_TABLE} select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AFRICA' and o_orderdate >= '1993-01-01' and o_orderdate < '1994-01-01' group by n_name order by revenue desc; (4)Similar issue also exists even after we simplified original query to a simpler one as below: drop table if exists ${env:RESULT_TABLE}; create table ${env:RESULT_TABLE} ( pid2 DOUBLE ) row format delimited fields terminated by ',' lines terminated by '\n' stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location '${env:RESULT_DIR}'; insert into table ${env:RESULT_TABLE} select sum(l_extendedprice * (1 - l_discount)) as revenue from lineitem group by l_orderkey order by revenue; -- This message was sent by Atlassian JIRA (v6.3.4#6332)