Laurent Martin created HIVE-13802:
-------------------------------------
Summary: Built-in aggregate functions may produce incorrect values
when all values being aggregated in a group are NULL as the result of an
expression
Key: HIVE-13802
URL: https://issues.apache.org/jira/browse/HIVE-13802
Project: Hive
Issue Type: Bug
Components: Hive, ORC, Tez
Affects Versions: 0.14.0
Reporter: Laurent Martin
With the Tez engine and Hive tables stored as ORC, built-in aggregate functions
may produce incorrect values when all values being aggregated in a group are
NULL as the result of an expression.
-- Test 1
-- The S column is populated as NaN
SET hive.execution.engine=tez;
CREATE TABLE LM
(
D STRING,
X DOUBLE
)
STORED AS ORC;
INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);
SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;
-- Test 2
-- The S column will be populated as 1 (dangerous case!)
SET hive.execution.engine=tez;
CREATE TABLE LM
(
D STRING,
X INT
)
STORED AS ORC;
INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);
SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;
-- Workaound:
-- According to my tests, a workaround is to surround the nullable expression
with
-- COALESCE. Example:
CREATE TABLE LM
(
D STRING,
X INT
)
STORED AS ORC;
INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);
SELECT D, MIN(COALESCE(X + 3)) AS S
FROM LM
GROUP BY D;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)