-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1637/
-----------------------------------------------------------

(Updated 2011-10-14 21:14:02.700645)


Review request for hive, Yongqiang He, Ning Zhang, and Siying Dong.


Changes
-------

It's been a while, so I ran svn up.


Summary
-------

If a user runs a query that includes group by ... with rollup, the behavior is 
like that of MySQL, see the task.

I had to implement 4 different ways of providing this behavior to fit in with 
the 4 different ways of implementing group by depending on whether map 
aggregation is allowed, and whether the data is known to be skewed.

If map aggregation is allowed, it is a simple matter of adding new keys with an 
increasing number of NULLs to the hash map to collect the data for the new rows 
as part of the map side hash aggregation.

If map aggregation is not allowed and the data is not skewed, I perform a 
reduce job which performs a hash aggregation very similar to the way it is 
performed on the map side.  I then perform a mergepartial reduce job to perform 
a final aggregation on the hash aggregation.

If map aggregation is not allowed and the data is skewed and there are no 
distinct aggregations for the group by, I aggregate data for the new rows with 
NULLs as part of the non-hash aggregation.  This was as simple as adding the 
new functionality to the map side hash aggregation.

If map aggregation is not allowed and the data is skewed and there are distinct 
aggregations fro the group by, I perform a reduce job which performs  a hash 
aggregation, and then use the same implementation that is used when there is no 
rollup option set.

I have done my best not to detract from any optimizations that were made for 
each of the four different implementations of group by, but, r for the ones 
where I add a new reduce job, I am not sure how successful I was.

Currently, the optimizations for multiple group bys is not supported for 
queries with the rollup option set, but I am continuing to look into this.


This addresses bug HIVE-2397.
    https://issues.apache.org/jira/browse/HIVE-2397


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1183502 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java 1183502 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 
1183502 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1183502 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1183502 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 1183502 
  trunk/ql/src/test/queries/clientpositive/groupby10_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby11_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_limit_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_nomap_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_skew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_map_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_noskew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby1_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_limit_withrollup.q 
PRE-CREATION 
  
trunk/ql/src/test/queries/clientpositive/groupby2_map_multi_distinct_withrollup.q
 PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_map_skew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_map_withrollup.q 
PRE-CREATION 
  
trunk/ql/src/test/queries/clientpositive/groupby2_noskew_multi_distinct_withrollup.q
 PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_noskew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby2_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_map_skew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_map_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_noskew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby7_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_map_skew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_map_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_noskew_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby8_withrollup.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby9_withrollup.q PRE-CREATION 
  
trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q
 PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_map_ppr_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_neg_float_withrollup.q 
PRE-CREATION 
  
trunk/ql/src/test/queries/clientpositive/groupby_ppr_multi_distinct_withrollup.q
 PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_ppr_withrollup.q 
PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/groupby_withrollup.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby10_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby11_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_limit_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_map_nomap_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_map_skew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_map_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_noskew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby1_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_limit_withrollup.q.out 
PRE-CREATION 
  
trunk/ql/src/test/results/clientpositive/groupby2_map_multi_distinct_withrollup.q.out
 PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_map_skew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_map_withrollup.q.out 
PRE-CREATION 
  
trunk/ql/src/test/results/clientpositive/groupby2_noskew_multi_distinct_withrollup.q.out
 PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_noskew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby2_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_map_skew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_map_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_noskew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby7_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_map_skew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_map_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_noskew_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby8_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby9_withrollup.q.out 
PRE-CREATION 
  
trunk/ql/src/test/results/clientpositive/groupby_map_ppr_multi_distinct_withrollup.q.out
 PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_map_ppr_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_neg_float_withrollup.q.out 
PRE-CREATION 
  
trunk/ql/src/test/results/clientpositive/groupby_ppr_multi_distinct_withrollup.q.out
 PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_ppr_withrollup.q.out 
PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/groupby_withrollup.q.out 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
 1183502 

Diff: https://reviews.apache.org/r/1637/diff


Testing
-------

For each existing group by test that looked like it might be relevant, I added 
a corresponding test with the rollup option set.  (I had found a number of bugs 
as I was implementing this with these tests, so I am satisfied that they are 
indeed relevant)

I also added a new test which provides good coverage of having multiple group 
by keys with both distinct and non-distinct aggregations in each of the four 
different implementations with the rollup operator set.

I also ran the original group by tests to verify I did not break anything.


Thanks,

Kevin

Reply via email to