[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

Wenchen Fan (JIRA) Mon, 10 Dec 2018 03:24:02 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714589#comment-16714589
 ]


Wenchen Fan commented on SPARK-26308:
-------------------------------------

`ScalaUDF.inputTypes` is only used to check input types, I think we should put 
a general `DecimalType` there, instead of a specific one like `DecimalType(38, 
18)`.

> Large BigDecimal value is converted to null when passed into a UDF
> ------------------------------------------------------------------
>
>                 Key: SPARK-26308
>                 URL: https://issues.apache.org/jira/browse/SPARK-26308
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Jay Pranavamurthi
>            Priority: Major
>
> We are loading a Hive table into a Spark DataFrame. The Hive table has a 
> decimal(30, 0) column with values greater than Long.MAX_VALUE. The DataFrame 
> loads correctly.
> We then use a UDF to convert the decimal type to a String value. For decimal 
> values < Long.MAX_VALUE, this works fine, but when the decimal value > 
> Long.MAX_VALUE, the input to the UDF is a *null*.
> Hive table schema and data:
> {code:java}
> create table decimal_test (col1 decimal(30, 0), col2 decimal(10, 0), col3 
> int, col4 string);
> insert into decimal_test values(2011000000000002456556, 123456789, 10, 
> 'test1');
> {code}
>  
> Execution in spark-shell:
> _(Note that the first column in the final output is null, it should have been 
> "2011000000000002456556")_
> {code:java}
> scala> val df1 = spark.sqlContext.sql("select * from decimal_test")
> df1: org.apache.spark.sql.DataFrame = [col1: decimal(30,0), col2: 
> decimal(10,0) ... 2 more fields]
> scala> df1.show
> +--------------------+---------+----+-----+
> | col1| col2|col3| col4|
> +--------------------+---------+----+-----+
> |20110000000000024...|123456789| 10|test1|
> +--------------------+---------+----+-----+
> scala> val decimalToString = (value: java.math.BigDecimal) => if (value == 
> null) null else { value.toBigInteger().toString }
> decimalToString: java.math.BigDecimal => String = <function1>
> scala> val udf1 = org.apache.spark.sql.functions.udf(decimalToString)
> udf1: org.apache.spark.sql.expressions.UserDefinedFunction = 
> UserDefinedFunction(<function1>,StringType,Some(List(DecimalType(38,18))))
> scala> val df2 = df1.withColumn("col1", udf1(df1.col("col1")))
> df2: org.apache.spark.sql.DataFrame = [col1: string, col2: decimal(10,0) ... 
> 2 more fields]
> scala> df2.show
> +----+---------+----+-----+
> |col1| col2|col3| col4|
> +----+---------+----+-----+
> |null|123456789| 10|test1|
> +----+---------+----+-----+
> {code}
> Oddly this works if we change the "decimalToString" udf to take an "Any" 
> instead of a "java.math.BigDecimal"
> {code:java}
> scala> val decimalToString = (value: Any) => if (value == null) null else { 
> if (value.isInstanceOf[java.math.BigDecimal]) 
> value.asInstanceOf[java.math.BigDecimal].toBigInteger().toString else null }
> decimalToString: Any => String = <function1>
> scala> val udf1 = org.apache.spark.sql.functions.udf(decimalToString)
> udf1: org.apache.spark.sql.expressions.UserDefinedFunction = 
> UserDefinedFunction(<function1>,StringType,None)
> scala> val df2 = df1.withColumn("col1", udf1(df1.col("col1")))
> df2: org.apache.spark.sql.DataFrame = [col1: string, col2: decimal(10,0) ... 
> 2 more fields]
> scala> df2.show
> +--------------------+---------+----+-----+
> | col1| col2|col3| col4|
> +--------------------+---------+----+-----+
> |20110000000000024...|123456789| 10|test1|
> +--------------------+---------+----+-----+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26308) Large BigDecimal value is converted to null when passed into a UDF

Reply via email to