[
https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Sichi updated HIVE-2382:
-----------------------------
Resolution: Fixed
Fix Version/s: (was: 0.8.0)
0.9.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed to trunk. Thanks Charles!
> Invalid predicate pushdown from incorrect column expression map for select
> operator generated by GROUP BY operation
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-2382
> URL: https://issues.apache.org/jira/browse/HIVE-2382
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: Charles Chen
> Assignee: Charles Chen
> Priority: Critical
> Fix For: 0.9.0
>
> Attachments: HIVE-2382v1.patch, HIVE-2382v2.patch
>
>
> When a GROUP BY is specified, a select operator is added before the GROUP BY
> in SemanticAnalyzer.insertSelectAllPlanForGroupBy. Currently, the column
> expression map for this is set to the column expression map for the parent
> operator. This behavior is incorrect as, for example, the parent operator
> could rearrange the order of the columns (_col0 => _col0, _col1 => _col2,
> _col2 => _col1) and the new operator should not repeat this.
> The predicate pushdown optimization uses the column expression map to track
> which columns a filter expression refers to at different operators. This
> results in a filter on incorrect columns.
> Here is a simple case of this going wrong: Using
> {noformat}
> create table invites (id int, foo int, bar int);
> {noformat}
> executing the query
> {noformat}
> explain select * from (select foo, bar from (select bar, foo from invites c
> union all select bar, foo from invites d) b) a group by bar, foo having bar=1;
> {noformat}
> results in
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> a-subquery1:b-subquery1:c
> TableScan
> alias: c
> Filter Operator
> predicate:
> expr: (foo = 1)
> type: boolean
> Select Operator
> expressions:
> expr: bar
> type: int
> expr: foo
> type: int
> outputColumnNames: _col0, _col1
> Union
> Select Operator
> expressions:
> expr: _col1
> type: int
> expr: _col0
> type: int
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> outputColumnNames: _col0, _col1
> Group By Operator
> bucketGroup: false
> keys:
> expr: _col1
> type: int
> expr: _col0
> type: int
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> sort order: ++
> Map-reduce partition columns:
> expr: _col0
> type: int
> expr: _col1
> type: int
> tag: -1
> a-subquery2:b-subquery2:d
> TableScan
> alias: d
> Filter Operator
> predicate:
> expr: (foo = 1)
> type: boolean
> Select Operator
> expressions:
> expr: bar
> type: int
> expr: foo
> type: int
> outputColumnNames: _col0, _col1
> Union
> Select Operator
> expressions:
> expr: _col1
> type: int
> expr: _col0
> type: int
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> outputColumnNames: _col0, _col1
> Group By Operator
> bucketGroup: false
> keys:
> expr: _col1
> type: int
> expr: _col0
> type: int
> mode: hash
> outputColumnNames: _col0, _col1
> Reduce Output Operator
> key expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> sort order: ++
> Map-reduce partition columns:
> expr: _col0
> type: int
> expr: _col1
> type: int
> tag: -1
> Reduce Operator Tree:
> Group By Operator
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: int
> expr: KEY._col1
> type: int
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: int
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> Note that the filter is now "foo = 1", while the correct behavior is to have
> "bar = 1". If we remove the group by, the behavior is correct.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira