Joseph K. Bradley created SPARK-23835: -----------------------------------------
Summary: When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1 Key: SPARK-23835 URL: https://issues.apache.org/jira/browse/SPARK-23835 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Joseph K. Bradley I constructed a DataFrame with a nullable java.lang.Double column (and an extra Double column). I then converted it to a Dataset using ```as[(Double, Double)]```. When the Dataset is shown, it has a null. When it is collected and printed, the null is silently converted to a -1. Code snippet to reproduce this: {code} val localSpark = spark import localSpark.implicits._ val df = Seq[(java.lang.Double, Double)]( (1.0, 2.0), (3.0, 4.0), (Double.NaN, 5.0), (null, 6.0) ).toDF("a", "b") df.show() // OUTPUT 1: has null df.printSchema() val data = df.as[(Double, Double)] data.show() // OUTPUT 2: has null data.collect().foreach(println) // OUTPUT 3: has -1 {code} OUTPUT 1 and 2: {code} +----+---+ | a| b| +----+---+ | 1.0|2.0| | 3.0|4.0| | NaN|5.0| |null|6.0| +----+---+ {code} OUTPUT 3: {code} (1.0,2.0) (3.0,4.0) (NaN,5.0) (-1.0,6.0) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org