Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22375#discussion_r218686614
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala
 ---
    @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite 
with ExpressionEvalHelper
         val e = intercept[RuntimeException] { 
checkEvaluation(BadCodegenExpression(), 10) }
         assert(e.getMessage.contains("some_variable"))
       }
    +
    +  test("SPARK-25388: checkEvaluation should fail if nullable in DataType 
is incorrect") {
    +    val e = intercept[RuntimeException] {
    +      checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> 
null))
    --- End diff --
    
    Sorry, you're right, I have not been precise in my previous statements so 
they are confusing. I'll try to elaborate more clearly.
    
    IIUC what you are proposing now is for detecting the case when the output 
datatype of an expression is not nullable and the expected output instead 
contains a null. The way you're trying to achieve this is to make always 
nullable the expected output, so that we get a null as output both for codegen 
on and codegen off (while before this change for codegen on we would get as 
expected output a wrong value, which is the default value for that data type).
    
    The problems I see on the approach above are:
     - if you test only codegen on or codegen off, you might still be missing 
to cover some cases (eg. see 
https://github.com/apache/spark/pull/22375#discussion_r218085246, with codegen 
off we can have null as output for a non-nullable expression...);
     - The failure you get is that the expected output is different from the 
the evaluated one, so this case is treated as any other failure in the 
expression evaluation.
    
    With something like this:
    ```
    assert(containsNull(expected) && isNullable(expression.dataType))
    ```
    I think we could solve the 2 problems above as:
     - this check would fail in both modes;
     - we can differentiate between a test failing because of a wrong output 
and this new case which covers: a bad written UT; an expression returning 
wrongly a non-nullable datatype instead of a nullable one when returning null.
    
    Hope now it is more clear. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to