Jeff Zhang created SPARK-11725: ---------------------------------- Summary: Let UDF to handle null value Key: SPARK-11725 URL: https://issues.apache.org/jira/browse/SPARK-11725 Project: Spark Issue Type: Improvement Components: SQL Reporter: Jeff Zhang
I notice that currently spark will take the long field as -1 if it is null. Here's the sample code. {code} sqlContext.udf.register("f", (x:Int)=>x+1) df.withColumn("age2", expr("f(age)")).show() //////////////// Output /////////////////////// +----+-------+----+ | age| name|age2| +----+-------+----+ |null|Michael| 0| | 30| Andy| 31| | 19| Justin| 20| +----+-------+----+ {code} I think for the null value we have 3 options * Use a special value to represent it (what spark does now) * Always return null if the udf input has null value argument * Let udf itself to handle null I would prefer the third option -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org