Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22326 Some thoughts: 1. This rule is a little tricky as it only handles python udf accessing attributes from both side. If it only accesses one side, we assume it can be pushed down later. Generally we should not depend on optimizer rules in an analyzer rule. My proposal is: move this rule to optimizer, as the last batch (but before the `UpdateAttributeReferences` batch). Since we apply this rule after filter pushdown, we can simply pull out any python udf in join condition. Also add this rule to `Optimizer.nonExcludableRules`, since this is a special optimizer rule that can't be turned off. 2. About cross join. I think we don't need to take care of it. My only concern is we have to keep the behavior same as before.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org