[
https://issues.apache.org/jira/browse/HIVE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242006#comment-14242006
]
Vikram Dixit K commented on HIVE-9025:
--------------------------------------
+1 for 0.14
> join38.q (without map join) produces incorrect result when testing with
> multiple reducers
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-9025
> URL: https://issues.apache.org/jira/browse/HIVE-9025
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer
> Affects Versions: 0.14.0
> Reporter: Chao
> Assignee: Ted Xu
> Priority: Blocker
> Attachments: HIVE-9025.1.patch, HIVE-9025.patch
>
>
> I have this query from a modified version of {{join38.q}}, which does NOT use
> map join:
> {code}
> FROM src a JOIN tmp b ON (a.key = b.col11)
> SELECT a.value, b.col5, count(1) as count
> where b.col11 = 111
> group by a.value, b.col5;
> {code}
> If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set
> it to be a larger number (3 for instance), then result will be
> {noformat}
> val_111 105 1
> {noformat}
> which is wrong.
> I think the issue is that, for this case, ConstantPropagationProcFactory will
> overwrite the partition cols for the reduce sink desc, with an empty list.
> Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is
> length 0, it will use an random number as hashcode, for each separate row. As
> result, rows with same key will be distributed to different reducers, and
> hence leads to incorrect result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)