[ https://issues.apache.org/jira/browse/IMPALA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992862#comment-16992862 ]
ASF subversion and git services commented on IMPALA-9162: --------------------------------------------------------- Commit df5c4061456abb947cec8add81b361b60c5d3ad8 in impala's branch refs/heads/master from Aman Sinha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=df5c406 ] IMPALA-9162: Do not apply inferred predicate to outer joins When the planner migrates predicates to inline views, it also creates equivalent predicates based on the value transfer graph which is built by transitive relationships among join conditions. These newly inferred predicates are placed typically as 'other predicates' of an inner or outer join. However, for outer joins, this has the effect of adding extra predicates in the WHERE clause which is incorrect since it may filter NULL values. Since the original query did not have null filtering conditions in the WHERE clause, we should not add new ones. In this fix we do the following: during the migration of conjuncts to inline views, analyze the predicate of type A <op> B and if it is an inferred predicate AND either the left or right slots reference the output tuple of an outer join, the inferred predicate is ignored. Note that simple queries with combination of inner and outer joins may not reproduce the problem. Due to the nature of predicate inferencing, some combination of subqueries, inner joins, outer joins is needed. For the query pattern, please see the example in the JIRA. Tests: - Added plan tests with left and right outer joins to inline-view.test - One baseline plan in inline-view.test had to be updated - Manually ran few queries on impala shell to verify result correctness: by checking that NULL values are being produced for outer joins. - Ran regression tests on jenkins Change-Id: Ie9521bd768c4b333069c34d5c1e11b10ea535827 Reviewed-on: http://gerrit.cloudera.org:8080/14813 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Incorrect redundant predicate applied to outer join > --------------------------------------------------- > > Key: IMPALA-9162 > URL: https://issues.apache.org/jira/browse/IMPALA-9162 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Major > Attachments: create.sql.txt > > > Run the attached create.sql script to create the tables and view. The > following query shows an incorrect redundant predicate applied to the outer > join. This seems another variant of past issues such as IMPALA-7957 and > IMPALA-8386. > {noformat} > // Has a redundant predicate as 'Other predicates' on Outer Join > Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join > v.v1 on t2.c2=v1.c3) as x > > > 06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] > hash predicates: c3 = t2.c2 > other predicates: c3 = max(c3) . <-- WRONG > runtime filters: RF000 <- t2.c2 > row-size=20B cardinality=397 > > --13:EXCHANGE [HASH(t2.c2)] > > 00:SCAN HDFS [v.t2] > HDFS partitions=1/1 files=1 size=639B > row-size=4B cardinality=397 > > > 12:EXCHANGE [HASH(c3)] > > > > 05:HASH JOIN [INNER JOIN, BROADCAST] > > hash predicates: c3 = max(c3) > > runtime filters: RF002 <- max(c3) > > row-size=16B cardinality=207 > {noformat} > > By comparison, the following query which does not have the v1.max_c3 column > in the SELECT list produces the correct plan: > {noformat} > // Does not have the redundant predicate > Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on > t2.c2=v1.c3) as x > 06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] > hash predicates: c3 = t2.c2 > runtime filters: RF000 <- t2.c2 > row-size=20B cardinality=397 > --13:EXCHANGE [HASH(t2.c2)] > 00:SCAN HDFS [v.t2] > HDFS partitions=1/1 files=1 size=639B > row-size=4B cardinality=397 > 12:EXCHANGE [HASH(c3)] > 05:HASH JOIN [INNER JOIN, BROADCAST] > hash predicates: c3 = max(c3) > runtime filters: RF002 <- max(c3) > row-size=16B cardinality=207 > {noformat} > Due the redundant predicate, the first query produces wrong results. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org