Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218686614 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { + val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Sorry, you're right, I have not been precise in my previous statements so they are confusing. I'll try to elaborate more clearly. IIUC what you are proposing now is for detecting the case when the output datatype of an expression is not nullable and the expected output instead contains a null. The way you're trying to achieve this is to make always nullable the expected output, so that we get a null as output both for codegen on and codegen off (while before this change for codegen on we would get as expected output a wrong value, which is the default value for that data type). The problems I see on the approach above are: - if you test only codegen on or codegen off, you might still be missing to cover some cases (eg. see https://github.com/apache/spark/pull/22375#discussion_r218085246, with codegen off we can have null as output for a non-nullable expression...); - The failure you get is that the expected output is different from the the evaluated one, so this case is treated as any other failure in the expression evaluation. With something like this: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` I think we could solve the 2 problems above as: - this check would fail in both modes; - we can differentiate between a test failing because of a wrong output and this new case which covers: a bad written UT; an expression returning wrongly a non-nullable datatype instead of a nullable one when returning null. Hope now it is more clear. Thanks.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org