[ 
https://issues.apache.org/jira/browse/IMPALA-9162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992862#comment-16992862
 ] 

ASF subversion and git services commented on IMPALA-9162:
---------------------------------------------------------

Commit df5c4061456abb947cec8add81b361b60c5d3ad8 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=df5c406 ]

IMPALA-9162: Do not apply inferred predicate to outer joins

When the planner migrates predicates to inline views, it also creates
equivalent predicates based on the value transfer graph which is built
by transitive relationships among join conditions. These newly inferred
predicates are placed typically as 'other predicates' of an inner or
outer join.

However, for outer joins, this has the effect of adding extra predicates
in the WHERE clause which is incorrect since it may filter NULL values.
Since the original query did not have null filtering conditions in
the WHERE clause, we should not add new ones. In this fix we do the
following: during the migration of conjuncts to inline views, analyze
the predicate of type A <op> B and if it is an inferred predicate AND
either the left or right slots reference the output tuple of an outer
join, the inferred predicate is ignored.

Note that simple queries with combination of inner and outer joins may
not reproduce the problem.  Due to the nature of predicate inferencing,
some combination of subqueries, inner joins, outer joins is needed.  For
the query pattern, please see the example in the JIRA.

Tests:
  - Added plan tests with left and right outer joins to inline-view.test
  - One baseline plan in inline-view.test had to be updated
  - Manually ran few queries on impala shell to verify result
correctness: by checking that NULL values are being produced for outer
joins.
  - Ran regression tests on jenkins

Change-Id: Ie9521bd768c4b333069c34d5c1e11b10ea535827
Reviewed-on: http://gerrit.cloudera.org:8080/14813
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Incorrect redundant predicate applied to outer join
> ---------------------------------------------------
>
>                 Key: IMPALA-9162
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9162
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>         Attachments: create.sql.txt
>
>
> Run the attached create.sql script to create the tables and view.  The 
> following query shows an incorrect redundant predicate applied to the outer 
> join.  This seems another variant of past issues such as IMPALA-7957 and 
> IMPALA-8386.  
> {noformat}
> // Has a redundant predicate as 'Other predicates' on Outer Join
> Query: explain select x.* from (select v1.c3, v1.max_c3 from v.t2 left join 
> v.v1 on  t2.c2=v1.c3) as x
>                                                                               
>     
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>    hash predicates: c3 = t2.c2
>    other predicates: c3 = max(c3) .    <-- WRONG
>    runtime filters: RF000 <- t2.c2
>    row-size=20B cardinality=397
>                    
>  --13:EXCHANGE [HASH(t2.c2)]                          
>                                                                     
>    00:SCAN HDFS [v.t2]                                            
>       HDFS partitions=1/1 files=1 size=639B
>       row-size=4B cardinality=397                               
>                                                                               
>        
>  12:EXCHANGE [HASH(c3)]                                                       
>        
>                                                                               
>        
>  05:HASH JOIN [INNER JOIN, BROADCAST]                                         
>        
>    hash predicates: c3 = max(c3)                                              
>        
>    runtime filters: RF002 <- max(c3)                                          
>        
>    row-size=16B cardinality=207       
> {noformat}
>          
> By comparison, the following query which does not have the v1.max_c3 column 
> in the SELECT list produces the correct plan:
> {noformat}
> // Does not have the redundant predicate
> Query: explain select x.* from (select v1.c3 from v.t2 left join v.v1 on  
> t2.c2=v1.c3) as x
>  06:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
>    hash predicates: c3 = t2.c2
>    runtime filters: RF000 <- t2.c2
>    row-size=20B cardinality=397
>  --13:EXCHANGE [HASH(t2.c2)]
>    00:SCAN HDFS [v.t2]
>       HDFS partitions=1/1 files=1 size=639B
>       row-size=4B cardinality=397
>  12:EXCHANGE [HASH(c3)]
>  05:HASH JOIN [INNER JOIN, BROADCAST]
>    hash predicates: c3 = max(c3)
>    runtime filters: RF002 <- max(c3)
>    row-size=16B cardinality=207
> {noformat}
> Due the redundant predicate, the first query produces wrong results. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to