[ https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-15493: ------------------------------------------- Attachment: HIVE-15493.01.patch Uploading same patch to trigger ptests again, as they run almost a week ago. [~pxiong], could you review it? Thanks > Wrong result for LEFT outer join in Tez using MapJoinOperator > ------------------------------------------------------------- > > Key: HIVE-15493 > URL: https://issues.apache.org/jira/browse/HIVE-15493 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-15493.01.patch, HIVE-15493.patch > > > To reproduce, we can run in Tez: > {code:sql} > set hive.auto.convert.join=true; > DROP TABLE IF EXISTS test_1; > CREATE TABLE test_1 > ( > member BIGINT > , age VARCHAR (100) > ) > STORED AS TEXTFILE > ; > DROP TABLE IF EXISTS test_2; > CREATE TABLE test_2 > ( > member BIGINT > ) > STORED AS TEXTFILE > ; > INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); > INSERT INTO test_2 VALUES (1), (2), (3); > SELECT > t2.member > , t1.age_1 > , t1.age_2 > FROM > test_2 t2 > LEFT JOIN ( > SELECT > member > , age as age_1 > , age as age_2 > FROM > test_1 > ) t1 > ON t2.member = t1.member > ; > {code} > Result is: > {noformat} > 1 20 NULL > 3 40 NULL > 2 30 NULL > {noformat} > Correct result is: > {noformat} > 1 20 20 > 3 40 40 > 2 30 30 > {noformat} > Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not > contain tests, it does look legit. In fact, the problem seems to be in the > MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT > outer or FULL outer). Although I am still trying to understand part of the > MapJoinOperator code path, the bug could be in the initialization of the > operator. It only happens when we have duplicate values in the right part of > the output. > Till we have more time to study the problem in detail and fix the > MapJoinOperator, I will submit a fix that removes the code in > SemanticAnalyzer that reuses duplicated value expressions from RS to create > multiple columns in the join output (this is equivalent to reverting > HIVE-10582). > Once this is pushed, I will create a follow-up issue to take this code back > and tackle the problem in the MapJoinOperator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)