Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22259#discussion_r224295469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,7 +48,8 @@ case class ScalaUDF( inputTypes: Seq[DataType] = Nil, udfName: Option[String] = None, nullable: Boolean = true, - udfDeterministic: Boolean = true) + udfDeterministic: Boolean = true, + nullableTypes: Seq[Boolean] = Nil) --- End diff -- Yes, the test should not pass after removing `isInstanceOf[KnownNotNull]` condition from `needsNullCheck` test (https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2160). The idea was to add a `KnownNotNull` node on top of the original node to mark it as null-checked, so the rule won't add redundant null checks even if it is accidentally applied again. I'm not sure about the exact reason why you removed `isInstanceOf[KnownNotNull]` condition in this PR, but I think it should be left there alongside the new nullable type check. After adding the `nullableTypes` parameter in the test, the issue can be reproduced: ``` test("SPARK-24891 Fix HandleNullInputsForUDF rule") { val a = testRelation.output(0) val func = (x: Int, y: Int) => x + y val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil) val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes = false :: false :: Nil) val plan = Project(Alias(udf2, "")() :: Nil, testRelation) comparePlans(plan.analyze, plan.analyze.analyze) } ``` BTW, I'm just curious: It looks like `nullableTypes` indicates something opposite to "nullable" used in schema. I would assume when `nullableTypes` is `Seq(false)`, it means this type is not nullable and we need not add the null check, vice versa. Did I miss something here?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org