[ 
https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422082#comment-16422082
 ] 

Marco Gaido commented on SPARK-23835:
-------------------------------------

Actually this is not the first time we see this. Previously, we said that it 
was a user error, since if the data is a nullable Double, you should convert it 
using {{.as[Option[Double]]}}.

Anyway, enforcing this would mean avoiding the conversion of a nullable value 
to Dobule/Int/etc. (throwing an exception during analysis); but this can break 
existing users' applications (where maybe null are not present). Or we can 
eventually asserting there there is no null if we try to convert to primitive 
type (better than the previous I think).

> When Dataset.as converts column from nullable to non-nullable type, null 
> Doubles are converted silently to -1
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23835
>                 URL: https://issues.apache.org/jira/browse/SPARK-23835
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> I constructed a DataFrame with a nullable java.lang.Double column (and an 
> extra Double column).  I then converted it to a Dataset using ```as[(Double, 
> Double)]```.  When the Dataset is shown, it has a null.  When it is collected 
> and printed, the null is silently converted to a -1.
> Code snippet to reproduce this:
> {code}
> val localSpark = spark
> import localSpark.implicits._
> val df = Seq[(java.lang.Double, Double)](
>   (1.0, 2.0),
>   (3.0, 4.0),
>   (Double.NaN, 5.0),
>   (null, 6.0)
> ).toDF("a", "b")
> df.show()  // OUTPUT 1: has null
> df.printSchema()
> val data = df.as[(Double, Double)]
> data.show()  // OUTPUT 2: has null
> data.collect().foreach(println)  // OUTPUT 3: has -1
> {code}
> OUTPUT 1 and 2:
> {code}
> +----+---+
> |   a|  b|
> +----+---+
> | 1.0|2.0|
> | 3.0|4.0|
> | NaN|5.0|
> |null|6.0|
> +----+---+
> {code}
> OUTPUT 3:
> {code}
> (1.0,2.0)
> (3.0,4.0)
> (NaN,5.0)
> (-1.0,6.0)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to