Sun Rui created HIVE-6021: ----------------------------- Summary: Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui
Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^^^^^^^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names = parameters.get(j).getExprString().split("\\."); // parameters of the form : KEY.colx:t.coly if (Utilities.ReduceField.KEY.name().equals(names[0])) { String name = names[names.length - 2]; int tag = Integer.parseInt(name.split("\\:")[1]); ... } else { // will be VALUE._COLx if (!nonDistinctAggrs.contains(i)) { nonDistinctAggrs.add(i); } } {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)