[ https://issues.apache.org/jira/browse/SPARK-26693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-26693. ------------------------------- Resolution: Cannot Reproduce Assuming the zeppelin interpretation is correct for now > Large Numbers Truncated > ------------------------ > > Key: SPARK-26693 > URL: https://issues.apache.org/jira/browse/SPARK-26693 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Environment: Code was run in Zeppelin using Spark 2.4. > Reporter: Jason Blahovec > Priority: Major > > We have a process that takes a file dumped from an external API and formats > it for use in other processes. These API dumps are brought into Spark with > all fields read in as strings. One of the fields is a 19 digit visitor ID. > Since implementing Spark 2.4 a few weeks ago, we have noticed that dataframes > read the 19 digits correctly but any function in SQL appears to truncate the > last two digits and replace them with "00". > Our process is set up to convert these numbers to bigint, which worked before > Spark 2.4. We looked into data types, and the possibility of changing to a > "long" type with no luck. At that point we tried bringing in the string > value as is, with the same result. I've added code that should replicate the > issue with a few 19 digit test cases and demonstrating the type conversions I > tried. > Results for the code below are shown here: > dfTestExpanded.show: > +-------------------+-------------------+-------------------+ | idAsString| > idAsBigint| idAsLong| > +-------------------+-------------------+-------------------+ > |4065453307562594031|4065453307562594031|4065453307562594031| > |7659957277770523059|7659957277770523059|7659957277770523059| > |1614560078712787995|1614560078712787995|1614560078712787995| > +-------------------+-------------------+-------------------+ > Run this query in a paragraph: > %sql > select * from global_temp.testTable > and see these results (all 3 columns): > 4065453307562594000 > 7659957277770523000 > 1614560078712788000 > > Another notable observation was that this issue soes not appear to affect > joins on the affected fields - we are seeing issues when the fields are used > in where clauses or as part of a select list. > > > {code:java} > // code placeholder > %pyspark > from pyspark.sql.functions import * > sfTestValue = StructField("testValue",StringType(), True) > schemaTest = StructType([sfTestValue]) > listTestValues = [] > listTestValues.append(("4065453307562594031",)) > listTestValues.append(("7659957277770523059",)) > listTestValues.append(("1614560078712787995",)) > dfTest = spark.createDataFrame(listTestValues, schemaTest) > dfTestExpanded = dfTest.selectExpr(\ > "testValue as idAsString",\ > "cast(testValue as bigint) as idAsBigint",\ > "cast(testValue as long) as idAsLong") > dfTestExpanded.show() ##This will show three columns of data correctly. > dfTestExpanded.createOrReplaceGlobalTempView('testTable') ##When this table > is viewed in a %sql paragraph, the truncated values are shown.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org