[ 
https://issues.apache.org/jira/browse/HIVE-20570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-20570:
-----------------------------------
    Attachment: HIVE-20570.2.patch

> Union ALL with hive.optimize.union.remove=true has incorrect plan
> -----------------------------------------------------------------
>
>                 Key: HIVE-20570
>                 URL: https://issues.apache.org/jira/browse/HIVE-20570
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Janaki Lahorani
>            Assignee: Janaki Lahorani
>            Priority: Major
>         Attachments: HIVE-20570.1.patch, HIVE-20570.2.patch
>
>
> When hive.optimize.union.remove=true and a select query is run with group by, 
> the final fetch is waiting only for one of the branches and not both.
> Test Case:
> {code}
> create table if not exists test_table(column1 string, column2 int);
> insert into test_table values('a',1),('b',2);
> set hive.optimize.union.remove=true;
> set mapred.input.dir.recursive=true;
> explain
> select column1 from test_table group by column1
> union all
> select column1 from test_table group by column1;
> {code}
> In the below the two stages correspond to the two parts of union all.  But 
> the final fetch operator (Stage 0) only depends on one of the stages, but it 
> should depend on both.
> Plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 is a root stage
>   *Stage-0 depends on stages: Stage-1*
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: test_table
>             Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column 
> stats: NONE
>             Select Operator
>               expressions: column1 (type: string)
>               outputColumnNames: column1
>               Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>               Group By Operator
>                 keys: column1 (type: string)
>                 mode: hash
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string)
>                   sort order: +
>                   Map-reduce partition columns: _col0 (type: string)
>                   Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>       Execution mode: vectorized
>       Reduce Operator Tree:
>         Group By Operator
>           keys: KEY._col0 (type: string)
>           mode: mergepartial
>           outputColumnNames: _col0
>           Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column 
> stats: NONE
>           File Output Operator
>             compressed: false
>             Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column 
> stats: NONE
>             table:
>                 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                 output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-2
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: test_table
>             Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column 
> stats: NONE
>             Select Operator
>               expressions: column1 (type: string)
>               outputColumnNames: column1
>               Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>               Group By Operator
>                 keys: column1 (type: string)
>                 mode: hash
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string)
>                   sort order: +
>                   Map-reduce partition columns: _col0 (type: string)
>                   Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE 
> Column stats: NONE
>       Execution mode: vectorized
>       Reduce Operator Tree:
>         Group By Operator
>           keys: KEY._col0 (type: string)
>           mode: mergepartial
>           outputColumnNames: _col0
>           Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column 
> stats: NONE
>           File Output Operator
>             compressed: false
>             Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column 
> stats: NONE
>             table:
>                 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                 output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to