[ https://issues.apache.org/jira/browse/SPARK-23959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23959. ---------------------------------- Resolution: Cannot Reproduce I can't reproduce in the master too. I am resolving this. It'd be nicer if the JIRA fixing this is identified and backported if applicable. > UnresolvedException with DataSet created from Seq.empty since Spark 2.3.0 > ------------------------------------------------------------------------- > > Key: SPARK-23959 > URL: https://issues.apache.org/jira/browse/SPARK-23959 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Sam De Backer > Priority: Major > > The following snippet works fine in Spark 2.2.1 but gives a rather cryptic > runtime exception in Spark 2.3.0: > {code:java} > import sparkSession.implicits._ > import org.apache.spark.sql.functions._ > case class X(xid: Long, yid: Int) > case class Y(yid: Int, zid: Long) > case class Z(zid: Long, b: Boolean) > val xs = Seq(X(1L, 10)).toDS() > val ys = Seq(Y(10, 100L)).toDS() > val zs = Seq.empty[Z].toDS() > val j = xs > .join(ys, "yid") > .join(zs, Seq("zid"), "left") > .withColumn("BAM", when('b, "B").otherwise("NB")) > j.show(){code} > In Spark 2.2.1 it prints to the console > {noformat} > +---+---+---+----+---+ > |zid|yid|xid| b|BAM| > +---+---+---+----+---+ > |100| 10| 1|null| NB| > +---+---+---+----+---+{noformat} > In Spark 2.3.0 it results in: > {noformat} > org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > dataType on unresolved object, tree: 'BAM > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) > at > org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435) > at > org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:296) > at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157) > ...{noformat} > The culprit really seems to be DataSet being created from an empty Seq[Z]. > When you change that to something that will also result in an empty > DataSet[Z] it works as in Spark 2.2.1, e.g. > {code:java} > val zs = Seq(Z(10L, true)).toDS().filter('zid < Long.MinValue){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org