Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r220516732
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -152,3 +153,51 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
           if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
       }
     }
    +
    +/**
    + * Correctly handle PythonUDF which need access both side of join side by 
changing the new join
    + * type to Cross.
    + */
    +object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with 
PredicateHelper {
    +  def hasPythonUDF(expression: Expression): Boolean = {
    +    expression.collectFirst { case udf: PythonUDF => udf }.isDefined
    --- End diff --
    
    This doesn't matter. We can't evaluate python udf in the join condition, 
and need to pull it out, that's all.
    
    For python udf accessing attributes from only one side, these would be 
pushed down by other rules. If they don't (e.g. user disables filter pushdown 
rule), we need to pull them out here, too. Anyway it's orthogonal to this rule.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to