[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22701


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224658264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2150,8 +2150,10 @@ class Analyzer(
 
 // TODO: skip null handling for not-nullable primitive inputs 
after we can completely
 // trust the `nullable` information.
+val needsNullCheck = (nullable: Boolean, expr: Expression) =>
--- End diff --

Yes, that's because "nullableType" is flipped around here. "nullableType" 
should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in 
other PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224620279
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2150,8 +2150,10 @@ class Analyzer(
 
 // TODO: skip null handling for not-nullable primitive inputs 
after we can completely
 // trust the `nullable` information.
+val needsNullCheck = (nullable: Boolean, expr: Expression) =>
--- End diff --

Should this param be something like `cantBeNull` or something? this 
receives `!nullableType` as its arg but is called `nullable`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224606888
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
   test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
 val a = testRelation.output(0)
 val func = (x: Int, y: Int) => x + y
-val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
-val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
+val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = 
false :: false :: Nil)
--- End diff --

OK, it sounds like we will have another 2.4 RC anyway, so we should get all 
of these changes in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224602234
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
   test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
 val a = testRelation.output(0)
 val func = (x: Int, y: Int) => x + y
-val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
-val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
+val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = 
false :: false :: Nil)
--- End diff --

It's two separate issues.
If `nullableTypes` is not added here, the `HandleNullInputsForUDF` will do 
nothing, which means null checks will be missed. So it is itself a problem, 
which can be potentially triggered by a user.
As to test, if the rule is not doing anything, the "doing something 
infinitely" bug cannot be reproduced. But the infinite issue is one on a 
theoretical level and is quite unlikely to have any end-user impact, thanks to 
@rxin's fix for SPARK-24865.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224596662
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2151,7 +2151,7 @@ class Analyzer(
 // TODO: skip null handling for not-nullable primitive inputs 
after we can completely
 // trust the `nullable` information.
 val inputsNullCheck = nullableTypes.zip(inputs)
-  .filter { case (nullable, _) => !nullable }
+  .filter { case (nullable, expr) => !nullable && 
!expr.isInstanceOf[KnownNotNull] }
--- End diff --

let us use the original way? in the PR 
https://github.com/apache/spark/pull/21851
```Scala
val needsNullCheck = ... 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22701#discussion_r224547656
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
   test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
 val a = testRelation.output(0)
 val func = (x: Int, y: Int) => x + y
-val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
-val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
+val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = 
false :: false :: Nil)
--- End diff --

So clearly we should make both of these changes. This change fixes the test 
here. But is the change above, involving `KnownNotNull`, important for 
correctness? that is can some user code trigger this infinite loop you mention 
in SPARK-24891? I'm trying to figure out whether the change here is absolutely 
required for 2.4, or an important change that could happen in 2.4.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

2018-10-11 Thread maryannxue
GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/22701

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize 
and can be applied infinitely

## What changes were proposed in this pull request?

The HandleNullInputsForUDF rule can generate new If node infinitely, thus 
causing problems like match of SQL cache missed.
This was fixed in SPARK-24891 and was then broken by SPARK-25044.
The unit test in `AnalysisSuite` added in SPARK-24891 should have failed 
but didn't because it wasn't properly updated after the `ScalaUDF` constructor 
signature change. So this PR also updates the test accordingly based on the new 
`ScalaUDF` constructor.

## How was this patch tested?

Updated the original UT. This should be justified as the original UT became 
invalid after SPARK-25044.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark spark-25690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22701


commit 736625b3f280dcd6caed83b3cec82b72d4f0c0fc
Author: maryannxue 
Date:   2018-10-11T16:45:28Z

[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize 
and can be applied infinitely




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org