[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20333 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162807813 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { checkAnswer(innerJoin, Row(1) :: Nil) } + test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + +"is false or null") { +val df = spark.range(10) --- End diff -- shouldn't it be false? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162794983 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { checkAnswer(innerJoin, Row(1) :: Nil) } + test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + +"is false or null") { +val df = spark.range(10) --- End diff -- > `withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162794939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper { */ def isCartesianProduct(join: Join): Boolean = { val conditions = join.condition.map(splitConjunctivePredicates).getOrElse(Nil) -!conditions.map(_.references).exists(refs => refs.exists(join.left.outputSet.contains) -&& refs.exists(join.right.outputSet.contains)) + +conditions match { + case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => false + case _ => !conditions.map(_.references).exists(refs => +refs.exists(join.left.outputSet.contains) && refs.exists(join.right.outputSet.contains)) +} } def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.crossJoinEnabled) { plan } else plan transform { - case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition) + case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _) --- End diff -- Yeah. For outer join, it makes sense to remove this check --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162793942 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper { */ def isCartesianProduct(join: Join): Boolean = { val conditions = join.condition.map(splitConjunctivePredicates).getOrElse(Nil) -!conditions.map(_.references).exists(refs => refs.exists(join.left.outputSet.contains) -&& refs.exists(join.right.outputSet.contains)) + +conditions match { + case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => false + case _ => !conditions.map(_.references).exists(refs => +refs.exists(join.left.outputSet.contains) && refs.exists(join.right.outputSet.contains)) +} } def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.crossJoinEnabled) { plan } else plan transform { - case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition) + case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _) --- End diff -- why are you saying that the size of the result set is the same? If you have a relation A (of size n, let's say 1M rows) in outer join with a relation B (of size m, let's say 1M rows). If the condition is true, the output relation is 1M * 1M (ie. (n * m)); if the condition is false, the result is 1M (n) for a left join, 1M (m) for a right join, 1M + 1M (m +n) for a full outer join. Therefore the size is not the same at all. But maybe you meant something different, am I missing something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162759088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1108,15 +1108,19 @@ object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper { */ def isCartesianProduct(join: Join): Boolean = { val conditions = join.condition.map(splitConjunctivePredicates).getOrElse(Nil) -!conditions.map(_.references).exists(refs => refs.exists(join.left.outputSet.contains) -&& refs.exists(join.right.outputSet.contains)) + +conditions match { + case Seq(Literal.FalseLiteral) | Seq(Literal(null, BooleanType)) => false + case _ => !conditions.map(_.references).exists(refs => +refs.exists(join.left.outputSet.contains) && refs.exists(join.right.outputSet.contains)) +} } def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.crossJoinEnabled) { plan } else plan transform { - case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition) + case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _) --- End diff -- For inner joins, we will not hit this, because it is already optimized to an empty relation. For the other outer join types, we face the exactly same issue as the condition is true. That is, the size of the join result sets is still the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20333#discussion_r162758553 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { checkAnswer(innerJoin, Row(1) :: Nil) } + test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " + +"is false or null") { +val df = spark.range(10) +val dfNull = spark.range(10).select(lit(null).as("b")) +val planNull = df.join(dfNull, $"id" === $"b", "left").queryExecution.analyzed + +spark.sessionState.executePlan(planNull).optimizedPlan + +val dfOne = df.select(lit(1).as("a")) +val dfTwo = spark.range(10).select(lit(2).as("a")) --- End diff -- `a` -> `b` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20333: [SPARK-23087][SQL] CheckCartesianProduct too rest...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/20333 [SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null ## What changes were proposed in this pull request? CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product. ## How was this patch tested? added UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23087 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20333.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20333 commit 9c88781dcd4cd301373927bfbe7f3530c80f4f05 Author: Marco Gaido Date: 2018-01-19T20:45:29Z [SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org