Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23153#discussion_r237944306
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
    @@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
     }
     
     /**
    - * PythonUDF in join condition can not be evaluated, this rule will detect 
the PythonUDF
    - * and pull them out from join condition. For python udf accessing 
attributes from only one side,
    - * they are pushed down by operation push down rules. If not (e.g. user 
disables filter push
    - * down rules), we need to pull them out in this rule too.
    + * PythonUDF in join condition can't be evaluated if it refers to 
attributes from both join sides.
    + * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable 
PythonUDF and pull them
    + * out from join condition.
      */
     object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with 
PredicateHelper {
    -  def hasPythonUDF(expression: Expression): Boolean = {
    -    expression.collectFirst { case udf: PythonUDF => udf }.isDefined
    +
    +  private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean 
= {
    +    expr.find { e =>
    +      PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && 
!canEvaluate(e, j.right)
    --- End diff --
    
    We might need a comment to explain why we only pull out the Scalar 
`PythonUDF`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to