Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22326
  
    Some thoughts:
    1. This rule is a little tricky as it only handles python udf accessing 
attributes from both side. If it only accesses one side, we assume it can be 
pushed down later. Generally we should not depend on optimizer rules in an 
analyzer rule. My proposal is: move this rule to optimizer, as the last batch 
(but before the `UpdateAttributeReferences` batch). Since we apply this rule 
after filter pushdown, we can simply pull out any python udf in join condition. 
Also add this rule to `Optimizer.nonExcludableRules`, since this is a special 
optimizer rule that can't be turned off.
    2. About cross join. I think we don't need to take care of it. My only 
concern is we have to keep the behavior same as before.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to