[jira] [Commented] (SPARK-22472) Datasets generate random values for null primitive types

Vladislav Kuzemchik (JIRA) Wed, 08 Nov 2017 09:07:21 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244319#comment-16244319
 ]


Vladislav Kuzemchik commented on SPARK-22472:
---------------------------------------------

I'm using Option[Long] as a workaround, but it is kinda scary to leave things 
as is and hope that you gonna catch it on review when anyone else is using 
datasets.

I think spark should warn(or even error with some config parameter set) when 
you converting nullable DataFrame column into non-optional type.

Currently if you do that with non-primitive type, you most likely gonna net 
NPE, and will have to handle this use case anyway.


In my opinion current implicit behavior cause much more harm. We talking about 
bad results without any notification.

> Datasets generate random values for null primitive types
> --------------------------------------------------------
>
>                 Key: SPARK-22472
>                 URL: https://issues.apache.org/jira/browse/SPARK-22472
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Vladislav Kuzemchik
>
> Not sure if it ever were reported.
> {code}
> scala> val s = 
> sc.parallelize(Seq[Option[Long]](None,Some(1L),Some(5))).toDF("v")
> s: org.apache.spark.sql.DataFrame = [v: bigint]
> scala> s.show(false)
> +----+
> |v   |
> +----+
> |null|
> |1   |
> |5   |
> +----+
> scala> s.as[Long].map(v => v*2).show(false)
> +-----+
> |value|
> +-----+
> |-2   |
> |2    |
> |10   |
> +-----+
> scala> s.select($"v"*2).show(false)
> +-------+
> |(v * 2)|
> +-------+
> |null   |
> |2      |
> |10     |
> +-------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22472) Datasets generate random values for null primitive types

Reply via email to