[jira] [Assigned] (SPARK-11725) Let UDF to handle null value
[ https://issues.apache.org/jira/browse/SPARK-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11725: Assignee: Apache Spark > Let UDF to handle null value > > > Key: SPARK-11725 > URL: https://issues.apache.org/jira/browse/SPARK-11725 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Jeff Zhang >Assignee: Apache Spark >Priority: Blocker > > I notice that currently spark will take the long field as -1 if it is null. > Here's the sample code. > {code} > sqlContext.udf.register("f", (x:Int)=>x+1) > df.withColumn("age2", expr("f(age)")).show() > Output /// > ++---++ > | age| name|age2| > ++---++ > |null|Michael| 0| > | 30| Andy| 31| > | 19| Justin| 20| > ++---++ > {code} > I think for the null value we have 3 options > * Use a special value to represent it (what spark does now) > * Always return null if the udf input has null value argument > * Let udf itself to handle null > I would prefer the third option -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11725) Let UDF to handle null value
[ https://issues.apache.org/jira/browse/SPARK-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11725: Assignee: (was: Apache Spark) > Let UDF to handle null value > > > Key: SPARK-11725 > URL: https://issues.apache.org/jira/browse/SPARK-11725 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Jeff Zhang >Priority: Blocker > > I notice that currently spark will take the long field as -1 if it is null. > Here's the sample code. > {code} > sqlContext.udf.register("f", (x:Int)=>x+1) > df.withColumn("age2", expr("f(age)")).show() > Output /// > ++---++ > | age| name|age2| > ++---++ > |null|Michael| 0| > | 30| Andy| 31| > | 19| Justin| 20| > ++---++ > {code} > I think for the null value we have 3 options > * Use a special value to represent it (what spark does now) > * Always return null if the udf input has null value argument > * Let udf itself to handle null > I would prefer the third option -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org