[ 
https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15493:
-------------------------------------------
    Attachment: HIVE-15493.patch

> Wrong result for LEFT outer join in Tez using MapJoinOperator
> -------------------------------------------------------------
>
>                 Key: HIVE-15493
>                 URL: https://issues.apache.org/jira/browse/HIVE-15493
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-15493.patch
>
>
> To reproduce, we can run in Tez:
> {code:sql}
> set hive.auto.convert.join=true;
> DROP TABLE IF EXISTS test_1; 
> CREATE TABLE test_1 
> ( 
> member BIGINT 
> , age VARCHAR (100) 
> ) 
> STORED AS TEXTFILE 
> ; 
> DROP TABLE IF EXISTS test_2; 
> CREATE TABLE test_2 
> ( 
> member BIGINT 
> ) 
> STORED AS TEXTFILE 
> ; 
> INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); 
> INSERT INTO test_2 VALUES (1), (2), (3); 
> SELECT 
> t2.member 
> , t1.age_1 
> , t1.age_2 
> FROM 
> test_2 t2 
> LEFT JOIN ( 
> SELECT 
> member 
> , age as age_1 
> , age as age_2 
> FROM 
> test_1 
> ) t1 
> ON t2.member = t1.member 
> ;
> {code}
> Result is:
> {noformat}
> 1     20      NULL
> 3     40      NULL
> 2     30      NULL
> {noformat}
> Correct result is:
> {noformat}
> 1     20      20
> 3     40      40
> 2     30      30
> {noformat}
> Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not 
> contain tests, it does look legit. In fact, the problem seems to be in the 
> MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT 
> outer or FULL outer). Although I am still trying to understand part of the 
> MapJoinOperator code path, the bug could be in the initialization of the 
> operator. It only happens when we have duplicate values in the right part of 
> the output.
> Till we have more time to study the problem in detail and fix the 
> MapJoinOperator, I will submit a fix that removes the code in 
> SemanticAnalyzer that reuses duplicated value expressions from RS to create 
> multiple columns in the join output (this is equivalent to reverting 
> HIVE-10582). 
> Once this is pushed, I will create a follow-up issue to take this code back 
> and tackle the problem in the MapJoinOperator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to