Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20816#discussion_r174302419
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -1071,6 +1072,66 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
       }
     }
     
    +/**
    + * Infer and transit predicate from the preserved side to the 
null-supplying side
    + * of an outer join. The predicate is inferred from the preserved side 
based on the
    + * join condition and will be pushed over to the null-supplying side. For 
example,
    + * if the preserved side has constraints of the form 'a > 5' and the join 
condition
    + * is 'a = b', in which 'b' is an attribute from the null-supplying side, 
a [[Filter]]
    + * operator of 'b > 5' will be applied to the null-supplying side.
    --- End diff --
    
    To infer filters from constraints for joins, we need the constraints from 
preserved side *and* the join condition. This is already supported for inner 
joins, but not for left or right outer joins. Since the constraints for 
left/right outer joins are returned as `left.constraints` and 
`right.constraints`. Yet we cannot include the join condition into the 
constraints for outer joins since they do not hold true in the join output 
(left.joinkey does not equal right.joinkey since either side can have null 
values).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to