[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22375 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r224660195 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -69,11 +69,22 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa /** * Check the equality between result of expression and expected value, it will handle - * Array[Byte], Spread[Double], MapData and Row. + * Array[Byte], Spread[Double], MapData and Row. Also check whether nullable in expression is + * true if result is null */ - protected def checkResult(result: Any, expected: Any, exprDataType: DataType): Boolean = { + protected def checkResult(result: Any, expected: Any, expression: Expression): Boolean = { +checkResult(result, expected, expression.dataType, expression.nullable) + } + + protected def checkResult( + result: Any, + expected: Any, + exprDataType: DataType, + exprNullable: Boolean): Boolean = { val dataType = UserDefinedType.sqlType(exprDataType) +// The result is null for a non-nullable expression +assert(result != null || exprNullable, "exprNullable should be true if result is null") --- End diff -- nit: how about "result cannot be null since it's not nullable." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r224414454 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -69,11 +69,17 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa /** * Check the equality between result of expression and expected value, it will handle - * Array[Byte], Spread[Double], MapData and Row. + * Array[Byte], Spread[Double], MapData and Row. Also check whether exprNullable is true + * if result of expression is null */ - protected def checkResult(result: Any, expected: Any, exprDataType: DataType): Boolean = { + protected def checkResult( + result: Any, + expected: Any, + exprDataType: DataType, + exprNullable: Boolean): Boolean = { val dataType = UserDefinedType.sqlType(exprDataType) +assert(result != null || exprNullable) --- End diff -- we should add message to the assert, e.g. `assert(result != null || exprNullable, "xxx")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r224414284 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -69,11 +69,22 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa /** * Check the equality between result of expression and expected value, it will handle - * Array[Byte], Spread[Double], MapData and Row. + * Array[Byte], Spread[Double], MapData and Row. Also check whether exprNullable is true --- End diff -- the comment doesn't match the method now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223783598 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- I think `checkResult` already validates `expression` according to the nullability of the given expression at 1. Thus, if the `expected` is not correct, 2. will detect an incorrect point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223245508 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -113,7 +113,7 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { assert(actual.length == 1) val expected = UTF8String.fromString("abc") -if (!checkResult(actual.head, expected, expressions.head.dataType)) { +if (!checkResult(actual.head, expected, expressions.head.dataType, expressions.head.nullable)) { --- End diff -- maybe we should provide an overload of `checkResult` that takes `Expression`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223214388 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- > does not perform checks recursively good point, I was not considering it. Then, do we need the check at https://github.com/apache/spark/pull/22375/files/9ef335d6e43a6ef7d253d0ed3564f95bd0278f71#diff-41747ec3f56901eb7bfb95d2a217e94dL231? Isn't it performed in `checkResult`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223214130 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- 1. sees only `expected`. 2. sees `expected` and `expression`. Thus, we are doing different. At 1, as we discussed, we need to check the consistency recursively. IIUC, `unsafeRow.get(0, dataType) != null || expression.nullable` does not perform checks recursively. Do I make a misunderstanding? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223205359 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- yes, I just meant that here we are checking the result and we are doing the same after too. Shouldn't we just add an assert for `unsafeRow.get(0, dataType) != null || expression.nullable` here instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223169695 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- We check different properties in these two `if` statements. 1. Line 231 checks consistency between value and `nullable` in `expected` 1. Line 245 checks bit-wise value between `expected` and `expression` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r223169637 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -113,7 +113,7 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { assert(actual.length == 1) val expected = UTF8String.fromString("abc") -if (!checkResult(actual.head, expected, expressions.head.dataType)) { +if (!checkResult(actual.head, expected, expressions.head.dataType, expressions.head.nullable)) { --- End diff -- That is another option that I thought. On the other hand, to set default has a risk to overlook a possible incosistency between value and `nullable` at top level of `expected`. Do we use the default value at the all of callers of `checkResult`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222954573 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- mmmh, I am not sure about this. Do we then still need the code below? Seems to me we are checking the same thing twice, please correct me if I am wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222920691 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -113,7 +113,7 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper { assert(actual.length == 1) val expected = UTF8String.fromString("abc") -if (!checkResult(actual.head, expected, expressions.head.dataType)) { +if (!checkResult(actual.head, expected, expressions.head.dataType, expressions.head.nullable)) { --- End diff -- It's a little weird to ask the caller to provide both `expected` and `exprNullable `, and then use `exprNullable ` to validate `expected.` Can we set a default value for `exprNullable` in `checkResult`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222741323 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- This is because this statement checks consistency between `expression` and its `nullable`, as you proposed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222740882 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -69,11 +69,17 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa /** * Check the equality between result of expression and expected value, it will handle - * Array[Byte], Spread[Double], MapData and Row. + * Array[Byte], Spread[Double], MapData and Row. Also check whether exprNullable is true + * if result of expression is null */ - protected def checkResult(result: Any, expected: Any, exprDataType: DataType): Boolean = { + protected def checkResult( + result: Any, + expected: Any, + exprDataType: DataType, + exprNullable: Boolean): Boolean = { val dataType = UserDefinedType.sqlType(exprDataType) +assert(result != null || exprNullable) --- End diff -- Sure, I will add the description --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222604522 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -221,6 +227,12 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa val unsafeRow = evaluateWithUnsafeProjection(expression, inputRow) val input = if (inputRow == EmptyRow) "" else s", input: $inputRow" +val dataType = expression.dataType +if (!checkResult(unsafeRow.get(0, dataType), expected, dataType, expression.nullable)) { --- End diff -- why did you add this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r222604455 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -69,11 +69,17 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa /** * Check the equality between result of expression and expected value, it will handle - * Array[Byte], Spread[Double], MapData and Row. + * Array[Byte], Spread[Double], MapData and Row. Also check whether exprNullable is true + * if result of expression is null */ - protected def checkResult(result: Any, expected: Any, exprDataType: DataType): Boolean = { + protected def checkResult( + result: Any, + expected: Any, + exprDataType: DataType, + exprNullable: Boolean): Boolean = { val dataType = UserDefinedType.sqlType(exprDataType) +assert(result != null || exprNullable) --- End diff -- Can we add a description which is more clear about the issue? Something like: `The result is null for a non-nullable expression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r221194592 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- With some hints from @ueshin, this PR implemented the check of `null` value with nullable bit in `checkResult()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219454650 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- The motivations are the 2 mentioned above. Basically, I am proposing the same suggestion @cloud-fan has just commented [here](https://github.com/apache/spark/pull/22375#discussion_r219452615) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219452615 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,16 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa } } else { val lit = InternalRow(expected, expected) --- End diff -- I think a more straightforward approach is, validate the `expected` according to the nullability of the given expression. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219448432 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- I may not still understand your motivation correctly. What is the motivation to introduce this assertion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219435279 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Yes, I said that the suggestion above is wrong and needs to be rewritten in a recursive way. Sorry for the bad suggestion, I just meant to show my idea. So it should be something like: ``` assert(!containsNullWhereNotNullable(expected, expression.dataType)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219432959 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Even if we make it checking recursively, I think that this case cannot be detected. This is because the mismatch occurs in the different recursive path. Would it be possible to share the case where we distingished a wrong output from a bad written UT in other places, as you proposed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219406170 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Yes, you're right, my suggestion doesn't work in a case like that, sorry. We would need to make it checking recursively. But I think you got the idea of what I am proposing here. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219397495 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- The your first is correct since this patch addresses only codegen-on case. We can add another code to address codegen-off case. Regarding the your second point, have we ever distingished a wrong output from a bad written UT when we defect the difference between `expression` and `expected`. I think that the distinguishment is nice to have, but not mandatory to have. I have one question about your approach: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` Since the above two conditions evaluates `expected` and `expression` independently, how this works for the following case? I think that the assertion would be passed ``` expression: dataType = StructType(ArrayType(IntegerType, false), ArrayType(IntegerType, true)) Struct(Array(0, null), Array(1, 0)) expected: Struct(Array(0, 0), Array(1, null)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218686614 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Sorry, you're right, I have not been precise in my previous statements so they are confusing. I'll try to elaborate more clearly. IIUC what you are proposing now is for detecting the case when the output datatype of an expression is not nullable and the expected output instead contains a null. The way you're trying to achieve this is to make always nullable the expected output, so that we get a null as output both for codegen on and codegen off (while before this change for codegen on we would get as expected output a wrong value, which is the default value for that data type). The problems I see on the approach above are: - if you test only codegen on or codegen off, you might still be missing to cover some cases (eg. see https://github.com/apache/spark/pull/22375#discussion_r218085246, with codegen off we can have null as output for a non-nullable expression...); - The failure you get is that the expected output is different from the the evaluated one, so this case is treated as any other failure in the expression evaluation. With something like this: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` I think we could solve the 2 problems above as: - this check would fail in both modes; - we can differentiate between a test failing because of a wrong output and this new case which covers: a bad written UT; an expression returning wrongly a non-nullable datatype instead of a nullable one when returning null. Hope now it is more clear. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218629945 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- I meant how your proposed assert can distinguish the two cases that you want to distinguish at [here](https://github.com/apache/spark/pull/22375#discussion_r217495874). 1. a failure caused by a bad test, ie. a test written wrongly 2. a UT failure caused by a bug in what we are testing As you said [here](https://github.com/apache/spark/pull/22375#discussion_r218362156), the assert may detect both cases 1. and 2.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218362156 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Well, actually I was not clear about this, sorry for that. I meant we can separate from: - a failure caused by a wrong result returned; - a failure caused by a wrong result type returned (ie. the expected values contains NULL while the expression return data type is not nullable - this case can happen both in case the UT is bad and in case the return type of the expression is wrong); --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218106457 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Would it be possible to let us know how can this assertion distinguish the following two cases? * a failure caused by a bad test, ie. a test written wrongly * a UT failure caused by a bug in what we are testing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218094296 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- yes, thanks for the summary, it states more clearly what I thought. My point is that this fix works properly only when we test both codegen on and off, but it would fail to detect the error condition it claims to fix if only one of them (for any reason) is tested. So I am wondering if it is possible to perform a check on the expected value, instead of this fix. Something like: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` where `containsNull` and `isNullable` have to be defined properly. In this way we should fail properly independently from whether codegen is on or not. And we can also give a more clear hint in the error message about the problem being most likely a bad UT. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r218085246 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Here is summary. * `Map(3 -> 7, 6 -> null)` passes in codegen off and fails in codegen on with this fix * `Map(3 -> 7, 6 -> -1)` fails in codegen off and fails in codegen on Would it be possible to share examples of two cases that you think we would be able to distinguish? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r217495874 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- I see, so the example above passes in codegen off and fails with codegen on with this fix, while using `Map(3 -> 7, 6 -> -1)` passes codegen on and fails codegen off, am I right? What I am thinking about (but I have not yet found a working implementation) is: since the problem arise when we say we expect `null` in a non-nullable datatype, can we add such a check? I mean, instead of pretending the expected value to be nullable, can't we add a check in case it is not nullable for being sure that it does not contain `null`? I think it would be better, because we would be able to distinguish a failure caused by a bad test, ie. a test written wrongly, from a UT failure caused by a bug in what we are testing. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r217454434 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Here is an output. This is because the test correctly detects a failure even in codegen-off mode, too. ``` "Incorrect evaluation (codegen off): mapincorrectdatatypeexpression(), actual: keys: [3,6], values: [7,null], expected: keys: [3,6], values: [7,-1]" did not contain "Incorrect evaluation in unsafe mode" ScalaTestFailureLocation: org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelperSuite$$anonfun$4 at (ExpressionEvalHelperSuite.scala:44) org.scalatest.exceptions.TestFailedException: "Incorrect evaluation (codegen off): mapincorrectdatatypeexpression(), actual: keys: [3,6], values: [7,null], expected: keys: [3,6], values: [7,-1]" did not contain "Incorrect evaluation in unsafe mode" at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r217433074 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,8 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa } } else { val lit = InternalRow(expected, expected) - val expectedRow = -UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit) + val dtAsNullable = expression.dataType.asNullable + val expectedRow = UnsafeProjection.create(Array(dtAsNullable, dtAsNullable)).apply(lit) --- End diff -- oh, yes. this is going to detect such failure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r217381027 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- what happens if here you put `Map(3 -> 7, 6 -> -1)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org