liukun4515 removed a comment on pull request #1618:
URL: 
https://github.com/apache/arrow-datafusion/pull/1618#issuecomment-1018260246


   > # Which issue does this PR close?
   > Closes #1586
   > 
   > # What changes are included in this PR?
   > This change does a couple of things:
   > 
   > 1. Does not attempt to push down any predicates for the nullable 
(non-preserved) side of a given join
   > 2. Bails on the optimization to duplicate post-join filters on a column 
involved in the join equality to the other side of the join, if the other side 
of the join is nullable.
   > 
   > I'll elaborate on point 2 here as I think the above is a bit difficult to 
parse. Consider the following query:
   > 
   > ```sql
   > SELECT *
   > FROM t1 JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1
   > ```
   > 
   > Right now this is being rewritten as:
   > 
   > ```sql
   > SELECT *
   > FROM t1 JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1 AND t2.uid > 1 -- Duplicate filter to t2.uid, so it can be 
pushed down
   > ```
   > 
   > This is correct in the case of certain joins. However, I believe that in 
cases where the other join side is non-preserved, we should not apply this 
optimization. For example, if we change the above join to a `LEFT` join, the 
query rewrite becomes:
   > 
   > ```sql
   > SELECT *
   > FROM t1 LEFT JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1 AND t2.uid > 1
   > ```
   > 
   > The additional filter on `t2.uid` is wrong - as it omits null rows that 
would have been present in the original query.
   
   great fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to