[ https://issues.apache.org/jira/browse/HIVE-20570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619920#comment-16619920 ]
Andrew Sherman commented on HIVE-20570: --------------------------------------- +1 LGTM pending test results > Union ALL with hive.optimize.union.remove=true has incorrect plan > ----------------------------------------------------------------- > > Key: HIVE-20570 > URL: https://issues.apache.org/jira/browse/HIVE-20570 > Project: Hive > Issue Type: Bug > Reporter: Janaki Lahorani > Assignee: Janaki Lahorani > Priority: Major > Attachments: HIVE-20570.1.patch, HIVE-20570.2.patch, > HIVE-20570.3.patch > > > When hive.optimize.union.remove=true and a select query is run with group by, > the final fetch is waiting only for one of the branches and not both. > Test Case: > {code} > create table if not exists test_table(column1 string, column2 int); > insert into test_table values('a',1),('b',2); > set hive.optimize.union.remove=true; > set mapred.input.dir.recursive=true; > explain > select column1 from test_table group by column1 > union all > select column1 from test_table group by column1; > {code} > In the below the two stages correspond to the two parts of union all. But > the final fetch operator (Stage 0) only depends on one of the stages, but it > should depend on both. > Plan: > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 is a root stage > *Stage-0 depends on stages: Stage-1* > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: test_table > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > Select Operator > expressions: column1 (type: string) > outputColumnNames: column1 > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Group By Operator > keys: column1 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Execution mode: vectorized > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column > stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column > stats: NONE > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > alias: test_table > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE Column > stats: NONE > Select Operator > expressions: column1 (type: string) > outputColumnNames: column1 > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Group By Operator > keys: column1 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 2 Data size: 6 Basic stats: COMPLETE > Column stats: NONE > Execution mode: vectorized > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column > stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column > stats: NONE > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)