[ https://issues.apache.org/jira/browse/HIVE-21322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777632#comment-16777632 ]
Vineet Garg commented on HIVE-21322: ------------------------------------ This looks like an issue with nway merge join. [~norvellj] Can you check if disabling nway merge {{set hive.merge.nway.joins=false}} fixes this? > Multiple table LEFT OUTER JOIN results are incorrect when 'is not null' used > in WHERE clause. > --------------------------------------------------------------------------------------------- > > Key: HIVE-21322 > URL: https://issues.apache.org/jira/browse/HIVE-21322 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 2.3.4 > Environment: Hive 2.3.4 (emr-5.21.0) > Reporter: James Norvell > Priority: Minor > Attachments: explain-plans.txt > > > Reproduction: > Create tables: > {code:java} > create table a (id string); insert into a values (1),(2),(3); > create table b (id string, name string); insert into b values > (1,'a'),(2,'b'),(3,null); > create table c (id string); insert into c values (11),(22),(33); > {code} > When joining a -> b -> c on id, the following query is correct: > {code:java} > select a.id, b.name from a > left outer join b on a.id = b.id > left outer join c on a.id = c.id > where b.name is not null; > OK > 1 a > 2 b > Time taken: 10.231 seconds, Fetched: 2 row(s) > {code} > Switching the join order from a -> c -> b results in incorrect results: > {code:java} > select a.id, b.name from a > left outer join c on a.id = c.id > left outer join b on a.id = b.id > where b.name is not null; > OK > 2 b > Time taken: 10.321 seconds, Fetched: 1 row(s) > {code} > Disabling hive.cbo.enable or changing execution engine to mr avoids the issue: > {code:java} > set hive.cbo.enable=false; > select a.id, b.name from a left outer join c on a.id = c.id left outer join b > on a.id = b.id where b.name is not null; > OK > 1 a > 2 b > Time taken: 9.614 seconds, Fetched: 2 row(s) > set hive.cbo.enable=true; > set hive.execution.engine=mr; > select a.id, b.name from a left outer join c on a.id = c.id left outer join b > on a.id = b.id where b.name is not null; > OK > 1 a > 2 b > Time taken: 29.377 seconds, Fetched: 2 row(s) > {code} > Issue doesn't occur when using 'is null': > {code:java} > select a.id, b.name from a left outer join c on a.id = c.id left outer join b > on a.id = b.id where b.name is null; > OK > 3 NULL > Time taken: 9.673 seconds, Fetched: 1 row(s) > {code} > Explain plans for queries attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)