[ https://issues.apache.org/jira/browse/SPARK-35841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-35841. --------------------------------- Fix Version/s: 3.1.3 3.2.0 Resolution: Fixed Issue resolved by pull request 33011 [https://github.com/apache/spark/pull/33011] > Casting string to decimal type doesn't work if the sum of the digits is > greater than 38 > --------------------------------------------------------------------------------------- > > Key: SPARK-35841 > URL: https://issues.apache.org/jira/browse/SPARK-35841 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1, 3.1.2 > Environment: Tested in a Kubernetes Cluster with Spark 3.1.1 and > Spark 3.1.2 images > (Hadoop 3.2.1, Python 3.9, Scala 2.12.13) > Reporter: Roberto Gelsi > Assignee: dgd_contributor > Priority: Major > Fix For: 3.2.0, 3.1.3 > > > Since Spark 3.1.1, NULL is returned when casting a string with many decimal > places to a decimal type. If the sum of the digits before and after the > decimal point is less than 39, a value is returned. From 39 digits, however, > NULL is returned. > This worked until Spark 3.0.X. > Code to reproduce: > * A string with 2 decimal places in front of the decimal point and 37 decimal > places after the decimal point returns null > {code:python} > data = ['28.9259999999999983799625624669715762138'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > +-----+ > |value| > +-----+ > |null | > +-----+ > > * A string with 2 decimal places in front of the decimal point and 36 decimal > places after the decimal point returns the number as decimal > {code:python} > data = ['28.925999999999998379962562466971576213'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > +--------+ > |value | > +--------+ > |28.92600| > +--------+ > * A string with 1 decimal place in front of the decimal point and 37 decimal > places after the decimal point returns the number as decimal > {code:python} > data = ['2.9259999999999983799625624669715762138'] > dfs = spark.createDataFrame(data, StringType()) > dfd = dfs.withColumn('value', col('value').cast('decimal(10, 5)')) > dfd.show(truncate=False) > {code} > +-------+ > |value | > +-------+ > |2.92600| > +-------+ > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org