[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20670 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171463022 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints, such as equality + * constraints and `isNotNull` constraints, etc. + */ + lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints --- End diff -- We still need `if (conf.constraintPropagationEnabled)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171462811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints, such as equality + * constraints and `isNotNull` constraints, etc. + */ + lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints +.union(inferAdditionalConstraints(validConstraints)) +.union(constructIsNotNullConstraints(validConstraints))) --- End diff -- Nit: indents --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171276798 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala --- @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest { comparePlans(Optimize.execute(original.analyze), correct.analyze) } + + test("SPARK-23405: left-semi equal-join should filter out null join keys on both sides") { +val x = testRelation.subquery('x) +val y = testRelation.subquery('y) +val condition = Some("x.a".attr === "y.a".attr) +val originalQuery = x.join(y, LeftSemi, condition).analyze +val left = x.where(IsNotNull('a)) +val right = y.where(IsNotNull('a)) +val correctAnswer = left.join(right, LeftSemi, condition) +.analyze --- End diff -- this doesn't need to be in a new line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171201415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints about equality --- End diff -- The comment is not acute, we may have various kinds of constraints. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171182870 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala --- @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest { comparePlans(Optimize.execute(original.analyze), correct.analyze) } + + test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") { +val x = testRelation.subquery('x) +val y = testRelation.subquery('y) +val originalQuery = x.join(y, LeftSemi, + condition = Some("x.a".attr === "y.a".attr)).analyze --- End diff -- nit: we can create a `val condition = Some("x.a".attr === "y.a".attr)` to reduce duplicated code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171182439 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala --- @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest { comparePlans(Optimize.execute(original.analyze), correct.analyze) } + + test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") { --- End diff -- nit: `SPARK-23405: left-semi equa-join should filter out null join keys on both sides` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171182102 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints about equality + * constraints and `isNotNull` constraints. + */ + lazy val allConstraints: ExpressionSet = { +if (conf.constraintPropagationEnabled) { + ExpressionSet(validConstraints +.union(inferAdditionalConstraints(validConstraints)) +.union(constructIsNotNullConstraints(validConstraints))) +} else { + ExpressionSet(Set.empty) +} + } + /** * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For * example, if this set contains the expression `a = 2` then that expression is guaranteed to * evaluate to `true` for all rows produced. */ lazy val constraints: ExpressionSet = { if (conf.constraintPropagationEnabled) { --- End diff -- now we don't need this if. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171102033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -23,20 +23,23 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => /** - * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For - * example, if this set contains the expression `a = 2` then that expression is guaranteed to - * evaluate to `true` for all rows produced. - */ +* An [[ExpressionSet]] that contains an additional set of constraints about equality +* constraints and `isNotNull` constraints. +*/ + lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints --- End diff -- This should also be guarded by `constraintPropagationEnabled` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r170840898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -27,16 +27,15 @@ trait QueryPlanConstraints { self: LogicalPlan => * example, if this set contains the expression `a = 2` then that expression is guaranteed to * evaluate to `true` for all rows produced. --- End diff -- The comment belongs to `constraints` not `allConstraints` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org