[ https://issues.apache.org/jira/browse/HIVE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V resolved HIVE-18174. ---------------------------- Resolution: Duplicate > Vectorization: De-dup Group-by key expressions (identical keys are irrelevant) > ------------------------------------------------------------------------------ > > Key: HIVE-18174 > URL: https://issues.apache.org/jira/browse/HIVE-18174 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 3.0.0 > Reporter: Gopal V > > {code} > hive.vectorized.execution.reduce.enabled=true; > hive.vectorized.execution.reduce.groupby.enabled=true; > create temporary table foo (x int) stored as orc; > insert into foo values(1),(2),(3); > insert into foo values(1),(2),(3); > set hive.cbo.enable=false; > select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo; > {code} > {code} > Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 > at > org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476) > at > org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288) > {code} > The key has duplicate references - {{keys: KEY._col0 (type: string), > KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)}} > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26 > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: foo > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: x (type: int) > outputColumnNames: x > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: concat('x', x) (type: string), concat('x', x) > (type: string), 'Foo' (type: string), 'Foo' (type: string) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: string), 'Foo' (type: > string) > sort order: ++ > Map-reduce partition columns: _col1 (type: string), > 'Foo' (type: string) > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: NONE > Execution mode: vectorized, llap > LLAP IO: all inputs > Reducer 2 > Execution mode: vectorized, llap > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string), KEY._col0 (type: string), > 'Foo' (type: string), 'Foo' (type: string) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > Select Operator > expressions: _col1 (type: string), _col1 (type: string), > 'Foo' (type: string), 'Foo' (type: string) > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)