[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847234#comment-13847234 ]
Sun Rui commented on HIVE-6021: ------------------------------- [~xzhang] test case added. review board entry: https://reviews.apache.org/r/16243/ > Problem in GroupByOperator for handling distinct aggrgations > ------------------------------------------------------------ > > Key: HIVE-6021 > URL: https://issues.apache.org/jira/browse/HIVE-6021 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.12.0 > Reporter: Sun Rui > Assignee: Sun Rui > Attachments: HIVE-6021.1.patch > > > Use the following test case with HIVE 0.12: > {code:sql} > create table src(key int, value string); > load data local inpath 'src/data/files/kv1.txt' overwrite into table src; > set hive.map.aggr=false; > select count(key),count(distinct value) from src group by key; > {code} > We will get an ArrayIndexOutOfBoundsException from GroupByOperator: > {code} > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 5 more > Caused by: java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) > ... 10 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) > ... 10 more > {code} > explain select count(key),count(distinct value) from src group by key; > {code} > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Alias -> Map Operator Tree: > src > TableScan > alias: src > Select Operator > expressions: > expr: key > type: int > expr: value > type: string > outputColumnNames: key, value > Reduce Output Operator > key expressions: > expr: key > type: int > expr: value > type: string > sort order: ++ > Map-reduce partition columns: > expr: key > type: int > tag: -1 > Reduce Operator Tree: > Group By Operator > aggregations: > expr: count(KEY._col0) // The parameter causes this problem > ^^^^^^^^^^^ > expr: count(DISTINCT KEY._col1:0._col0) > bucketGroup: false > keys: > expr: KEY._col0 > type: int > mode: complete > outputColumnNames: _col0, _col1, _col2 > Select Operator > expressions: > expr: _col1 > type: bigint > expr: _col2 > type: bigint > outputColumnNames: _col0, _col1 > File Output Operator > compressed: false > GlobalTableId: 0 > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Stage: Stage-0 > Fetch Operator > limit: -1 > {code} > The root cause is within GroupByOperator.initializeOp(). The method forgets > to handle the case: > For a query has distinct aggregations, there is an aggregation function has a > parameter which is a groupby key column but not distinct key column. > {code} > if (unionExprEval != null) { > String[] names = parameters.get(j).getExprString().split("\\."); > // parameters of the form : KEY.colx:t.coly > if (Utilities.ReduceField.KEY.name().equals(names[0])) { > String name = names[names.length - 2]; > int tag = Integer.parseInt(name.split("\\:")[1]); > > ... > > } else { > // will be VALUE._COLx > if (!nonDistinctAggrs.contains(i)) { > nonDistinctAggrs.add(i); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)