[ https://issues.apache.org/jira/browse/SPARK-23025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-23025. ----------------------------- Resolution: Fixed Fix Version/s: 2.3.0 > DataSet with scala.Null causes Exception > ---------------------------------------- > > Key: SPARK-23025 > URL: https://issues.apache.org/jira/browse/SPARK-23025 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Reporter: Daniel Davis > Fix For: 2.3.0 > > > When creating a DataSet over a case class containing a field of type > scala.Null, there is an exception thrown. As far as I can see, spark sql > would support a Schema(NullType, true), but it fails inside the {{schemaFor}} > function with a {{MatchError}}. > I would expect spark to return a DataSet with a NullType for that field. > h5. Minimal Exampe > {code} > case class Foo(foo: Int, bar: Null) > val ds = Seq(Foo(42, null)).toDS() > {code} > h5. Exception > {code} > scala.MatchError: scala.Null (of class > scala.reflect.internal.Types$ClassNoArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:713) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:704) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:703) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1$$anonfun$9.apply(ScalaReflection.scala:391) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1$$anonfun$9.apply(ScalaReflection.scala:390) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:390) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:148) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) > at > org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor(ScalaReflection.scala:148) > at > org.apache.spark.sql.catalyst.ScalaReflection$.deserializerFor(ScalaReflection.scala:136) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:72) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:275) > at > org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33) > ... 42 elided > {code} > h5. Background Info > To handle our data in a type-safe fashion, we have generated AVRO schemas and > corresponding scala case classes for our domain data. As some fields only > contain null values, this results in fields with scala.Null as a type. Moving > our pipeline to DataSets/structured streaming, case classes with Null types > begin to give problems, even trough NullType is known to spark SQL. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org