[ https://issues.apache.org/jira/browse/SPARK-22472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244319#comment-16244319 ]
Vladislav Kuzemchik commented on SPARK-22472: --------------------------------------------- I'm using Option[Long] as a workaround, but it is kinda scary to leave things as is and hope that you gonna catch it on review when anyone else is using datasets. I think spark should warn(or even error with some config parameter set) when you converting nullable DataFrame column into non-optional type. Currently if you do that with non-primitive type, you most likely gonna net NPE, and will have to handle this use case anyway. In my opinion current implicit behavior cause much more harm. We talking about bad results without any notification. > Datasets generate random values for null primitive types > -------------------------------------------------------- > > Key: SPARK-22472 > URL: https://issues.apache.org/jira/browse/SPARK-22472 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: Vladislav Kuzemchik > > Not sure if it ever were reported. > {code} > scala> val s = > sc.parallelize(Seq[Option[Long]](None,Some(1L),Some(5))).toDF("v") > s: org.apache.spark.sql.DataFrame = [v: bigint] > scala> s.show(false) > +----+ > |v | > +----+ > |null| > |1 | > |5 | > +----+ > scala> s.as[Long].map(v => v*2).show(false) > +-----+ > |value| > +-----+ > |-2 | > |2 | > |10 | > +-----+ > scala> s.select($"v"*2).show(false) > +-------+ > |(v * 2)| > +-------+ > |null | > |2 | > |10 | > +-------+ > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org