Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/20816#discussion_r174302419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1071,6 +1072,66 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { } } +/** + * Infer and transit predicate from the preserved side to the null-supplying side + * of an outer join. The predicate is inferred from the preserved side based on the + * join condition and will be pushed over to the null-supplying side. For example, + * if the preserved side has constraints of the form 'a > 5' and the join condition + * is 'a = b', in which 'b' is an attribute from the null-supplying side, a [[Filter]] + * operator of 'b > 5' will be applied to the null-supplying side. --- End diff -- To infer filters from constraints for joins, we need the constraints from preserved side *and* the join condition. This is already supported for inner joins, but not for left or right outer joins. Since the constraints for left/right outer joins are returned as `left.constraints` and `right.constraints`. Yet we cannot include the join condition into the constraints for outer joins since they do not hold true in the join output (left.joinkey does not equal right.joinkey since either side can have null values).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org