[ https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527556#comment-16527556 ]
Marek Novotny edited comment on SPARK-24165 at 6/29/18 12:33 PM: ----------------------------------------------------------------- It seems that Spark is not able resolve nullability for nested types correctly. {{val rows = new util.ArrayList[Row]()}} {{rows.add(Row(true, ("1", 1)))}} {{rows.add(Row(false, (null, 2)))}} {{val schema = StructType(Seq(}} {{StructField("cond", BooleanType, false),}} {{StructField("s", StructType(Seq(}} {{StructField("val1", StringType, true),}} {{StructField("val2", IntegerType, false)}} {{)))}} {{))}} {{val df = spark.createDataFrame(rows, schema)}} {{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}} Result: {{root}} {{|-- result: struct (nullable = true)}} {{| |-- val1: string (nullable = *{color:#ff0000}false{color}*)}} {{| |-- val2: integer (nullable = false)}} I will take a look at the problem. was (Author: mn-mikke): It seems that Spark is not able resolve nullability for nested types correctly. {{val rows = new util.ArrayList[Row]()}} {{rows.add(Row(true, ("1", 1)))}} {{rows.add(Row(false, (null, 2)))}} {{val schema = StructType(Seq(}} {{ StructField("cond", BooleanType, false),}} {{ StructField("s", StructType(Seq(}} {{ StructField("val1", StringType, true),}} {{ StructField("val2", IntegerType, false)}} {{ )))}} {{))}} {{val df = spark.createDataFrame(rows, schema)}} {{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) as "result").printSchema()}} Result: {{root}} {{ |-- result: struct (nullable = true)}} {{ | |-- val1: string (nullable = *{color:#FF0000}false{color}*)}} {{ | |-- val2: integer (nullable = false)}} I will take a look at the problem. > UDF within when().otherwise() raises NullPointerException > --------------------------------------------------------- > > Key: SPARK-24165 > URL: https://issues.apache.org/jira/browse/SPARK-24165 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Jingxuan Wang > Priority: Major > > I have a UDF which takes java.sql.Timestamp and String as input column type > and returns an Array of (Seq[case class], Double) as output. Since some of > values in input columns can be nullable, I put the UDF inside a > when($input.isNull, null).otherwise(UDF) filter. Such function works well > when I test in spark shell. But running as a scala jar in spark-submit with > yarn cluster mode, it raised NullPointerException which points to the UDF > function. If I remove the when().otherwsie() condition, but put null check > inside the UDF, the function works without issue in spark-submit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org