GitHub user mn-mikke opened a pull request: https://github.com/apache/spark/pull/21747
[SPARK-24165][SQL][branch-2.3] Fixing conditional expressions to handle nullability of nested types ## What changes were proposed in this pull request? This PR is proposing a fix for the output data type of ```If``` and ```CaseWhen``` expression. Upon till now, the implementation of exprassions has ignored nullability of nested types from different execution branches and returned the type of the first branch. This could lead to an unwanted ```NullPointerException``` from other expressions depending on a ```If```/```CaseWhen``` expression. Example: ``` val rows = new util.ArrayList[Row]() rows.add(Row(true, ("a", 1))) rows.add(Row(false, (null, 2))) val schema = StructType(Seq( StructField("cond", BooleanType, false), StructField("s", StructType(Seq( StructField("val1", StringType, true), StructField("val2", IntegerType, false) )), false) )) val df = spark.createDataFrame(rows, schema) df .select(when('cond, struct(lit("x").as("val1"), lit(10).as("val2"))).otherwise('s) as "res") .select('res.getField("val1")) .show() ``` Exception: ``` Exception in thread "main" java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) ... ``` Output schema: ``` root |-- res.val1: string (nullable = false) ``` ## How was this patch tested? New test cases added into - DataFrameSuite.scala - conditionalExpressions.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/mn-mikke/spark SPARK-24165-branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21747.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21747 ---- commit a2fe63e1d48b0291feaa9fcd008654da051d1f1b Author: Marek Novotny <mn.mikke@...> Date: 2018-07-11T04:21:03Z [SPARK-24165][SQL][branch-2.3] Fixing conditional expressions to handle nullability of nested types ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org