[jira] [Resolved] (HIVE-18174) Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)

Gopal V (JIRA) Wed, 03 Jan 2018 02:01:01 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V resolved HIVE-18174.
----------------------------
    Resolution: Duplicate

> Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-18174
>                 URL: https://issues.apache.org/jira/browse/HIVE-18174
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>
> {code}
> hive.vectorized.execution.reduce.enabled=true;
> hive.vectorized.execution.reduce.groupby.enabled=true;
> create temporary table foo (x int) stored as orc;
> insert into foo values(1),(2),(3);
> insert into foo values(1),(2),(3);
> set hive.cbo.enable=false;
> select distinct concat('x', x) x, concat('x', x), 'Foo', 'Foo' from foo;
> {code}
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:476)
>         at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:288)
> {code}
> The key has duplicate references - {{keys: KEY._col0 (type: string), 
> KEY._col0 (type: string), 'Foo' (type: string), 'Foo' (type: string)}}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: gopal_20171128220857_9c9def2e-d0a4-461a-8fd6-f9fdaea2d5ce:26
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName: 
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: foo
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>                   Select Operator
>                     expressions: x (type: int)
>                     outputColumnNames: x
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       keys: concat('x', x) (type: string), concat('x', x) 
> (type: string), 'Foo' (type: string), 'Foo' (type: string)
>                       mode: hash
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col1 (type: string), 'Foo' (type: 
> string)
>                         sort order: ++
>                         Map-reduce partition columns: _col1 (type: string), 
> 'Foo' (type: string)
>                         Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string), KEY._col0 (type: string), 
> 'Foo' (type: string), 'Foo' (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>                 Select Operator
>                   expressions: _col1 (type: string), _col1 (type: string), 
> 'Foo' (type: string), 'Foo' (type: string)
>                   outputColumnNames: _col0, _col1, _col2, _col3
>                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 4 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-18174) Vectorization: De-dup Group-by key expressions (identical keys are irrelevant)

Reply via email to