[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16067 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90177599 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = Seq("_1", "_2")) } + test("SPARK-17897: Fixed IsNotNull Constraint Inference Rule") { +val data = Seq[java.lang.Integer](1, null).toDF("key") +checkAnswer(data.filter("not key is not null"), Row(null)) --- End diff -- sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90177515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = +constraint match { + case IsNotNull(_: Attribute) => constraint :: Nil --- End diff -- Yeah, my original idea is to do a fast stop. After rethinking it, it might be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90177567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = +constraint match { + case IsNotNull(_: Attribute) => constraint :: Nil + // When the root is IsNotNull, we can push IsNotNull through the child null intolerant + // expressions + case IsNotNull(expr) => scanNullIntolerantExpr(expr).map(IsNotNull(_)) + // Constraints always return true for all the inputs. That means, null will never be returned. + // Thus, we can infer `IsNotNull(constraint)`, and also push IsNotNull through the child + // null intolerant expressions. + case _ => scanNullIntolerantExpr(constraint).map(IsNotNull(_)) +} + + /** * Recursively explores the expressions which are null intolerant and returns all attributes * in these expressions. */ private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match { --- End diff -- Sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90177275 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = Seq("_1", "_2")) } + test("SPARK-17897: Fixed IsNotNull Constraint Inference Rule") { +val data = Seq[java.lang.Integer](1, null).toDF("key") +checkAnswer(data.filter("not key is not null"), Row(null)) --- End diff -- shall we use DataFrame API? i.e. `data.filter(!$"key".isNotNull)`. The string version looks weird... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90176972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = +constraint match { + case IsNotNull(_: Attribute) => constraint :: Nil + // When the root is IsNotNull, we can push IsNotNull through the child null intolerant + // expressions + case IsNotNull(expr) => scanNullIntolerantExpr(expr).map(IsNotNull(_)) + // Constraints always return true for all the inputs. That means, null will never be returned. + // Thus, we can infer `IsNotNull(constraint)`, and also push IsNotNull through the child + // null intolerant expressions. + case _ => scanNullIntolerantExpr(constraint).map(IsNotNull(_)) +} + + /** * Recursively explores the expressions which are null intolerant and returns all attributes * in these expressions. */ private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match { --- End diff -- shall we rename it to `scanNullIntolerantAttribute`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90176867 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = +constraint match { + case IsNotNull(_: Attribute) => constraint :: Nil --- End diff -- we don't this case, I think it can be covered by the next case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90169798 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = --- End diff -- Yeah. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90168753 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = --- End diff -- Yes. After this PR, we do not support it. This is a pretty rare case, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90167277 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = --- End diff -- This change simply ignores all `IsNotNull`s which are not the top expression. The above case works because `Filter` splits it. But if the constraint looks like `Cast(IsNotNull(a), Integer) == 1`, we won't infer `IsNotNull(a)` from it, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90164208 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala --- @@ -351,6 +351,15 @@ class ConstraintPropagationSuite extends SparkFunSuite { IsNotNull(IsNotNull(resolveColumn(tr, "b"))), IsNotNull(resolveColumn(tr, "a")), IsNotNull(resolveColumn(tr, "c") + +verifyConstraints( + tr.where('a.attr === 1 && IsNotNull(resolveColumn(tr, "b")) && +IsNotNull(resolveColumn(tr, "c"))).analyze.constraints, + ExpressionSet(Seq( +resolveColumn(tr, "a") === 1, +IsNotNull(resolveColumn(tr, "c")), +IsNotNull(resolveColumn(tr, "a")), +IsNotNull(resolveColumn(tr, "b") --- End diff -- The test case is added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90154852 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -58,13 +57,28 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } /** + * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions + * of constraints. + */ + private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = --- End diff -- Can this infer `IsNotNull(a)`, `IsNotNull(b)` from `IsNotNull(a) && IsNotNull(b)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint In...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16067#discussion_r90143591 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1697,6 +1697,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { expr = "cast((_1 + _2) as boolean)", expectedNonNullableColumns = Seq("_1", "_2")) } + test("SPARK-17897: Attribute is not NullIntolerant") { --- End diff -- New test case name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org