[ 
https://issues.apache.org/jira/browse/SPARK-35841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35841.
---------------------------------
    Fix Version/s: 3.1.3
                   3.2.0
       Resolution: Fixed

Issue resolved by pull request 33011
[https://github.com/apache/spark/pull/33011]

> Casting string to decimal type doesn't work if the sum of the digits is 
> greater than 38
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-35841
>                 URL: https://issues.apache.org/jira/browse/SPARK-35841
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.1, 3.1.2
>         Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and 
> Spark 3.1.2 images
> (Hadoop 3.2.1, Python 3.9, Scala 2.12.13)
>            Reporter: Roberto Gelsi
>            Assignee: dgd_contributor
>            Priority: Major
>             Fix For: 3.2.0, 3.1.3
>
>
> Since Spark 3.1.1, NULL is returned when casting a string with many decimal 
> places to a decimal type. If the sum of the digits before and after the 
> decimal point is less than 39, a value is returned. From 39 digits, however, 
> NULL is returned.
> This worked until Spark 3.0.X.
> Code to reproduce:
> * A string with 2 decimal places in front of the decimal point and 37 decimal 
> places after the decimal point returns null
> {code:python}
> data = ['28.9259999999999983799625624669715762138']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> +-----+
> |value|
> +-----+
> |null |
> +-----+
>  
> * A string with 2 decimal places in front of the decimal point and 36 decimal 
> places after the decimal point returns the number as decimal
> {code:python}
> data = ['28.925999999999998379962562466971576213']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> +--------+
> |value   |
> +--------+
> |28.92600|
> +--------+
> * A string with 1 decimal place in front of the decimal point and 37 decimal 
> places after the decimal point returns the number as decimal
> {code:python}
> data = ['2.9259999999999983799625624669715762138']
> dfs = spark.createDataFrame(data, StringType())
> dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)'))
> dfd.show(truncate=False)
> {code}
> +-------+
> |value  |
> +-------+
> |2.92600|
> +-------+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to