[ https://issues.apache.org/jira/browse/SPARK-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211381#comment-15211381 ]
koert kuipers commented on SPARK-14139: --------------------------------------- i believe the difference is in the definition of schema in Dataset. before it was: {noformat} override def schema: StructType = resolvedTEncoder.schema {noformat} now it is: {noformat} def schema: StructType = queryExecution.analyzed.schema {noformat} but queryExecution.analyzed (which is a LogicalPlan) does not respect nullability in multiple places. In this particular case it is in RowEncoder.extractorsFor, where for a StructType for the fields nullable is ignored. > Dataset loses nullability in operations with RowEncoder > ------------------------------------------------------- > > Key: SPARK-14139 > URL: https://issues.apache.org/jira/browse/SPARK-14139 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: koert kuipers > Priority: Minor > > When i do > {noformat} > val df1 = sc.makeRDD(1 to 3).toDF > val df2 = df1.map(row => Row(row(0).asInstanceOf[Int] + > 1))(RowEncoder(df1.schema)) > println(s"schema before ${df1.schema} and after ${df2.schema}") > {noformat} > I get: > {noformat} > schema before StructType(StructField(value,IntegerType,false)) and after > StructType(StructField(value,IntegerType,true)) > {noformat} > The change in field nullable is unexpected and i consider it a bug. > This bug was introduced in: > [SPARK-13244][SQL] Migrates DataFrame to Dataset -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org