Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220628624 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExecSuite.scala --- @@ -100,6 +104,28 @@ class BatchEvalPythonExecSuite extends SparkPlanTest with SharedSQLContext { } assert(qualifiedPlanNodes.size == 1) } + + test("SPARK-25314: Python UDF refers to the attributes from more than one child " + + "in join condition") { + def dummyPythonUDFTest(): Unit = { + val df = Seq(("Hello", 4)).toDF("a", "b") + val df2 = Seq(("Hello", 4)).toDF("c", "d") + val joinDF = df.join(df2, + dummyPythonUDF(col("a"), col("c")) === dummyPythonUDF(col("d"), col("c"))) + val qualifiedPlanNodes = joinDF.queryExecution.executedPlan.collect { + case b: BatchEvalPythonExec => b + } + assert(qualifiedPlanNodes.size == 1) + } + // Test without spark.sql.crossJoin.enabled set + val errMsg = intercept[AnalysisException] { + dummyPythonUDFTest() + } + assert(errMsg.getMessage.startsWith("Detected implicit cartesian product")) + // Test with spark.sql.crossJoin.enabled=true + spark.conf.set("spark.sql.crossJoin.enabled", "true") --- End diff -- Thanks, done in 7f66954. ``` So I'd prefer having one or 2 end-to-end tests and create a new suite testing only the rule and the plan transformation, both for having lower testing time and finer grained tests checking that the output plan is indeed the expected one (not only checking the result of the query). ``` Make sense, will add a plan test for this rule.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org