[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

xuanyuanking Tue, 25 Sep 2018 20:38:15 -0700

Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22326#discussion_r220418201
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -1304,10 +1307,27 @@ object CheckCartesianProducts extends 
Rule[LogicalPlan] with PredicateHelper {
         }
       }
     
    +  /**
    +   * Check if a join contains PythonUDF in join condition.
    +   */
    +  def hasPythonUDFInJoinCondition(join: Join): Boolean = {
    +    val conditions = 
join.condition.map(splitConjunctivePredicates).getOrElse(Nil)
    +    conditions.exists(HandlePythonUDFInJoinCondition.hasPythonUDF)
    +  }
    +
       def apply(plan: LogicalPlan): LogicalPlan =
         if (SQLConf.get.crossJoinEnabled) {
           plan
         } else plan transform {
    +      case j @ Join(_, _, _, _) if hasPythonUDFInJoinCondition(j) =>
    --- End diff --
    
    Maybe not, we should keep the current logic, as the test below:
    
![image](https://user-images.githubusercontent.com/4833765/46055860-866c0180-c180-11e8-94e4-1f86af04b42a.png)
    In the join condition, only one python udf but we still need this 
AnalysisException. If the logic here change to `havePythonUDFInAllConditions`, 
you'll get a runtime exception of `requires attributes from more than one 
child.` like:
    
![image](https://user-images.githubusercontent.com/4833765/46055852-7d7b3000-c180-11e8-867c-f522ca175920.png)




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

Reply via email to