[ 
https://issues.apache.org/jira/browse/SPARK-26693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26693.
-------------------------------
    Resolution: Cannot Reproduce

Assuming the zeppelin interpretation is correct for now

> Large Numbers Truncated 
> ------------------------
>
>                 Key: SPARK-26693
>                 URL: https://issues.apache.org/jira/browse/SPARK-26693
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Code was run in Zeppelin using Spark 2.4.
>            Reporter: Jason Blahovec
>            Priority: Major
>
> We have a process that takes a file dumped from an external API and formats 
> it for use in other processes.  These API dumps are brought into Spark with 
> all fields read in as strings.  One of the fields is a 19 digit visitor ID.  
> Since implementing Spark 2.4 a few weeks ago, we have noticed that dataframes 
> read the 19 digits correctly but any function in SQL appears to truncate the 
> last two digits and replace them with "00".  
> Our process is set up to convert these numbers to bigint, which worked before 
> Spark 2.4.  We looked into data types, and the possibility of changing to a 
> "long" type with no luck.  At that point we tried bringing in the string 
> value as is, with the same result.  I've added code that should replicate the 
> issue with a few 19 digit test cases and demonstrating the type conversions I 
> tried.
> Results for the code below are shown here:
> dfTestExpanded.show:
> +-------------------+-------------------+-------------------+ | idAsString| 
> idAsBigint| idAsLong| 
> +-------------------+-------------------+-------------------+ 
> |4065453307562594031|4065453307562594031|4065453307562594031| 
> |7659957277770523059|7659957277770523059|7659957277770523059| 
> |1614560078712787995|1614560078712787995|1614560078712787995| 
> +-------------------+-------------------+-------------------+
> Run this query in a paragraph:
> %sql
> select * from global_temp.testTable
> and see these results (all 3 columns):
> 4065453307562594000
> 7659957277770523000
> 1614560078712788000
>  
> Another notable observation was that this issue soes not appear to affect 
> joins on the affected fields - we are seeing issues when the fields are used 
> in where clauses or as part of a select list.
>  
>  
> {code:java}
> // code placeholder
> %pyspark
> from pyspark.sql.functions import *
> sfTestValue = StructField("testValue",StringType(), True)
> schemaTest = StructType([sfTestValue])
> listTestValues = []
> listTestValues.append(("4065453307562594031",))
> listTestValues.append(("7659957277770523059",))
> listTestValues.append(("1614560078712787995",))
> dfTest = spark.createDataFrame(listTestValues, schemaTest)
> dfTestExpanded = dfTest.selectExpr(\
> "testValue as idAsString",\
> "cast(testValue as bigint) as idAsBigint",\
> "cast(testValue as long) as idAsLong")
> dfTestExpanded.show() ##This will show three columns of data correctly.
> dfTestExpanded.createOrReplaceGlobalTempView('testTable') ##When this table 
> is viewed in a %sql paragraph, the truncated values are shown.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to