Laurent Martin created HIVE-13802:
-------------------------------------

             Summary: Built-in aggregate functions may produce incorrect values 
when all values being aggregated in a group are NULL as the result of an 
expression
                 Key: HIVE-13802
                 URL: https://issues.apache.org/jira/browse/HIVE-13802
             Project: Hive
          Issue Type: Bug
          Components: Hive, ORC, Tez
    Affects Versions: 0.14.0
            Reporter: Laurent Martin


With the Tez engine and Hive tables stored as ORC, built-in aggregate functions 
may produce incorrect values when all values being aggregated in a group are 
NULL as the result of an expression.

-- Test 1
-- The S column is populated as NaN

SET hive.execution.engine=tez; 
CREATE TABLE LM
(
    D STRING,
    X DOUBLE
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;

-- Test 2
-- The S column will be populated as 1 (dangerous case!)

SET hive.execution.engine=tez; 
CREATE TABLE LM
(
    D STRING,
    X INT
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(X + 3) AS S
FROM LM
GROUP BY D;

-- Workaound:
-- According to my tests, a workaround is to surround the nullable expression 
with
-- COALESCE. Example:

CREATE TABLE LM
(
    D STRING,
    X INT
)
STORED AS ORC;

INSERT INTO TABLE LM
VALUES
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-11',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL),
('2016-05-12',NULL);

SELECT D, MIN(COALESCE(X + 3)) AS S
FROM LM
GROUP BY D;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to