Chao created HIVE-9025:
--------------------------

             Summary: join38.q (without map join) produces incorrect result 
when testing with multiple reducers
                 Key: HIVE-9025
                 URL: https://issues.apache.org/jira/browse/HIVE-9025
             Project: Hive
          Issue Type: Bug
            Reporter: Chao


I have this query from a modified version of {{join38.q}}, which does NOT use 
map join:

{code}
FROM src a JOIN tmp b ON (a.key = b.col11)
SELECT a.value, b.col5, count(1) as count
where b.col11 = 111
group by a.value, b.col5;
{code}

If I set {{mapred.reduce.tasks}} to 1, the result is correct. But, if I set it 
to be a larger number (3 for instance), then result will be 

{noformat}
val_111 105     1
{noformat}

which is wrong.

I think the issue is that, for this case, ConstantPropagationProcFactory will 
overwrite the partition cols for the reduce sink desc, with an empty list. 
Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is 
length 0, it will use an random number as hashcode, for each separate row. As 
result, rows with same key will be distributed to different reducers, and hence 
leads to incorrect result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to