-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29878/
-----------------------------------------------------------
(Updated Jan. 19, 2015, 1:10 a.m.)
Review request for hive.
Changes
-------
Addressed comments
Bugs: HIVE-9347
https://issues.apache.org/jira/browse/HIVE-9347
Repository: hive-git
Description
-------
It looks like the query below returns incorrect results on Hive 0.13.1, but it
was working fine on Hive 0.11.
I have the following table:
CREATE TABLE `t`(
`category` int,
`live` int,
`comments` int)
with the following data:
hive> select * from t;
OK
3 0 2
2 0 2
8 0 2
The query:
hive> select category, max(live) live, max(comments) comments, rank() OVER
(PARTITION BY category ORDER BY comments) rank1
FROM t
GROUP BY category
GROUPING SETS ((), (category))
HAVING max(comments) > 0;
return the following results:
NULL 1 48 1
2 1 49 1
3 1 49 1
8 1 49 1
When using grouping sets with the rank() function the max() function return
incorrect results. Everything works fine if I remove grouping sets clause and
split the query into two independent queries or remove the rank() function.
This looks like a bug to me but please review. That said, I'm not sure if it's
just Amazon issue or general Hive issue.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
90b4b12
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
afd1738
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
87fba2d
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
82f4243
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b93a293
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
Diff: https://reviews.apache.org/r/29878/diff/
Testing
-------
Thanks,
Navis Ryu