[ https://issues.apache.org/jira/browse/IMPALA-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Behm resolved IMPALA-5856. ------------------------------------ Resolution: Fixed Fix Version/s: Impala 2.11.0 commit 032dee28de2fe95f80d9284eb41b64fd12a56c86 Author: Alex Behm <alex.b...@cloudera.com> Date: Mon Sep 11 10:33:39 2017 -0700 IMPALA-5856: Fix outer join predicate assignment. Fixes incorrect assignment of join predicates with the following properties: - from the On-clause of a left outer join - only references the left-hand side tuples (not the right hand side tuple) - references full-outer joined tuples; the full outer join appears on the left Testing: - a core/hdfs run passed - added new regression test Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb Reviewed-on: http://gerrit.cloudera.org:8080/8039 Reviewed-by: Alex Behm <alex.b...@cloudera.com> Tested-by: Impala Public Jenkins > Queries with full outer and left join miss result rows > ------------------------------------------------------ > > Key: IMPALA-5856 > URL: https://issues.apache.org/jira/browse/IMPALA-5856 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 > Environment: CDH 5.10.0 > Reporter: Julian Eberius > Assignee: Alexander Behm > Priority: Blocker > Labels: correctness > Fix For: Impala 2.11.0 > > > When combining a full outer join with a left join, some of the left join > predicates seem to be treated as general WHERE-clauses, which leads to > missing rows. Minimal working example: > {code:sql} > create table A (a int, av int); > create table B (a int, bv int); > create table C (a int, cv int); > insert into A values (1,1), (2,2), (3,3); > insert into B values (2,22),(4,44); > insert into C values (2,222); > -- all results are returned as expected > select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, > b.a)=c.a); > -- only one row is returned, as if the last clause was a WHERE clause > select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, > b.a)=c.a and coalesce(a.av,b.bv)=2); > -- no rows are returned at all, even though only the columns of C should be > affected > select * from A full outer join B on a.a=b.a left join C on (coalesce(a.a, > b.a)=c.a and coalesce(a.av,b.bv)=100); > -- removing the full outer join leads to the expected result > select * from A left join C on (coalesce(a.a)=c.a and coalesce(a.av)=100); > {code} > Running the exact same SQL in PostgreSQL, only the columns of C are ever > affected by the left join ON condition, the number of rows never changes. As > far as we understand, this should be the expected behaviour. -- This message was sent by Atlassian JIRA (v6.4.14#64029)