[
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751502#comment-13751502
]
Yin Huai commented on HIVE-5149:
--------------------------------
Another example, in reduce_deduplicate_extended.q, there is
{code:sql}
explain from (select key, value from src group by key, value) s select s.key
group by s.key;
{\code}
The plan is
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
s:src
TableScan
alias: src
Select Operator
expressions:
expr: key
type: string
expr: value
type: string
outputColumnNames: key, value
Group By Operator
bucketGroup: false
keys:
expr: key
type: string
expr: value
type: string
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: string
sort order: ++
Map-reduce partition columns:
expr: _col0
type: string
expr: _col1
type: string
tag: -1
Reduce Operator Tree:
Group By Operator
bucketGroup: false
keys:
expr: KEY._col0
type: string
expr: KEY._col1
type: string
mode: mergepartial
outputColumnNames: _col0, _col1
Select Operator
expressions:
expr: _col0
type: string
outputColumnNames: _col0
Group By Operator
bucketGroup: false
keys:
expr: _col0
type: string
mode: complete
outputColumnNames: _col0
Select Operator
expressions:
expr: _col0
type: string
outputColumnNames: _col0
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{code}
I think the plan is wrong. We should use key as the partitioning column to make
sure all rows associated with the same key will be assigned to the same reducer.
> ReduceSinkDeDuplication can pick the wrong partitioning columns
> ---------------------------------------------------------------
>
> Key: HIVE-5149
> URL: https://issues.apache.org/jira/browse/HIVE-5149
> Project: Hive
> Issue Type: Bug
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira