[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22701 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224658264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2150,8 +2150,10 @@ class Analyzer( // TODO: skip null handling for not-nullable primitive inputs after we can completely // trust the `nullable` information. +val needsNullCheck = (nullable: Boolean, expr: Expression) => --- End diff -- Yes, that's because "nullableType" is flipped around here. "nullableType" should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in other PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224620279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2150,8 +2150,10 @@ class Analyzer( // TODO: skip null handling for not-nullable primitive inputs after we can completely // trust the `nullable` information. +val needsNullCheck = (nullable: Boolean, expr: Expression) => --- End diff -- Should this param be something like `cantBeNull` or something? this receives `!nullableType` as its arg but is called `nullable`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224606888 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers { test("SPARK-24891 Fix HandleNullInputsForUDF rule") { val a = testRelation.output(0) val func = (x: Int, y: Int) => x + y -val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil) -val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil) +val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) --- End diff -- OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224602234 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers { test("SPARK-24891 Fix HandleNullInputsForUDF rule") { val a = testRelation.output(0) val func = (x: Int, y: Int) => x + y -val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil) -val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil) +val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) --- End diff -- It's two separate issues. If `nullableTypes` is not added here, the `HandleNullInputsForUDF` will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user. As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224596662 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2151,7 +2151,7 @@ class Analyzer( // TODO: skip null handling for not-nullable primitive inputs after we can completely // trust the `nullable` information. val inputsNullCheck = nullableTypes.zip(inputs) - .filter { case (nullable, _) => !nullable } + .filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] } --- End diff -- let us use the original way? in the PR https://github.com/apache/spark/pull/21851 ```Scala val needsNullCheck = ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22701#discussion_r224547656 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers { test("SPARK-24891 Fix HandleNullInputsForUDF rule") { val a = testRelation.output(0) val func = (x: Int, y: Int) => x + y -val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil) -val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil) +val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) --- End diff -- So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving `KnownNotNull`, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/22701 [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely ## What changes were proposed in this pull request? The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed. This was fixed in SPARK-24891 and was then broken by SPARK-25044. The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor. ## How was this patch tested? Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark spark-25690 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22701.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22701 commit 736625b3f280dcd6caed83b3cec82b72d4f0c0fc Author: maryannxue Date: 2018-10-11T16:45:28Z [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org