[ 
https://issues.apache.org/jira/browse/SPARK-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527556#comment-16527556
 ] 

Marek Novotny edited comment on SPARK-24165 at 6/29/18 12:33 PM:
-----------------------------------------------------------------

It seems that Spark is not able resolve nullability for nested types correctly.

{{val rows = new util.ArrayList[Row]()}}
 {{rows.add(Row(true, ("1", 1)))}}
 {{rows.add(Row(false, (null, 2)))}}
 {{val schema = StructType(Seq(}}
 {{StructField("cond", BooleanType, false),}}
 {{StructField("s", StructType(Seq(}}
 {{StructField("val1", StringType, true),}}
 {{StructField("val2", IntegerType, false)}}
 {{)))}}
 {{))}}

{{val df = spark.createDataFrame(rows, schema)}}

{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) 
as "result").printSchema()}}

Result:

{{root}}
 {{|-- result: struct (nullable = true)}}
 {{| |-- val1: string (nullable = *{color:#ff0000}false{color}*)}}
 {{| |-- val2: integer (nullable = false)}}

 

I will take a look at the problem.

 


was (Author: mn-mikke):
It seems that Spark is not able resolve nullability for nested types correctly.

{{val rows = new util.ArrayList[Row]()}}
{{rows.add(Row(true, ("1", 1)))}}
{{rows.add(Row(false, (null, 2)))}}
{{val schema = StructType(Seq(}}
{{ StructField("cond", BooleanType, false),}}
{{ StructField("s", StructType(Seq(}}
{{ StructField("val1", StringType, true),}}
{{ StructField("val2", IntegerType, false)}}
{{ )))}}
{{))}}

{{val df = spark.createDataFrame(rows, schema)}}

{{df.select(when('cond, expr("struct('x' as val1, 10 as val2)")).otherwise('s) 
as "result").printSchema()}}

Result:

{{root}}
{{ |-- result: struct (nullable = true)}}
{{ | |-- val1: string (nullable = *{color:#FF0000}false{color}*)}}
{{ | |-- val2: integer (nullable = false)}}

 

I will take a look at the problem.

 

> UDF within when().otherwise() raises NullPointerException
> ---------------------------------------------------------
>
>                 Key: SPARK-24165
>                 URL: https://issues.apache.org/jira/browse/SPARK-24165
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Jingxuan Wang
>            Priority: Major
>
> I have a UDF which takes java.sql.Timestamp and String as input column type 
> and returns an Array of (Seq[case class], Double) as output. Since some of 
> values in input columns can be nullable, I put the UDF inside a 
> when($input.isNull, null).otherwise(UDF) filter. Such function works well 
> when I test in spark shell. But running as a scala jar in spark-submit with 
> yarn cluster mode, it raised NullPointerException which points to the UDF 
> function. If I remove the when().otherwsie() condition, but put null check 
> inside the UDF, the function works without issue in spark-submit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to