[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16055 The above fix does not cover all the cases. Found the root cause. The `constraints` of an operator is the expressions that evaluate to `true` for all the rows produced. That means, the expression result should be neither `false` nor `unknown` (NULL). Thus, we can conclude that `IsNotNull` on all the constraints, which are generated by its own predicates or propagated from the children. The constraint can be a complex expression. For better usage of these constraints, we try to push down `IsNotNull` to the lowest-level expressions. `IsNotNull` can be pushed through an expression when it is null intolerant. (When the input is NULL, the null-intolerant expression always evaluates to null.) Below is the code we have for `IsNotNull` pushdown. ```Scala private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match { case a: Attribute => Seq(a) case _: NullIntolerant | IsNotNull(_: NullIntolerant) => expr.children.flatMap(scanNullIntolerantExpr) case _ => Seq.empty[Attribute] } ``` `IsNotNull` is not null-intolerant. It converts `null` to `false`. If there does not exist any `Not`-like expression, it works; otherwise, it could generate a wrong result. The above function needs to be corrected to ```Scala private def scanNullIntolerantExpr(expr: Expression): Seq[Attribute] = expr match { case a: Attribute => Seq(a) case _: NullIntolerant => expr.children.flatMap(scanNullIntolerantExpr) case _ => Seq.empty[Attribute] } ``` This fixes the problem, but we need a smarter fix for avoiding regressions. Now, working on a better fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16055 This does not resolve all the cases. Will submit a better fix today. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16055 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16055 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69310/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16055 **[Test build #69310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69310/consoleFull)** for PR 16055 at commit [`33c10a0`](https://github.com/apache/spark/commit/33c10a0994c9802df901f211e1f28c52e34df27f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Attribute extends LeafExpression with NamedExpression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16055 Sure, will update the PR description tomorrow. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16055 **[Test build #69310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69310/consoleFull)** for PR 16055 at commit [`33c10a0`](https://github.com/apache/spark/commit/33c10a0994c9802df901f211e1f28c52e34df27f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16055: [SPARK-17897] [SQL] Attribute is not NullIntolerant
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16055 Can you explain how did nullintolerant impact the case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org