In 1.6, when you created a Dataset from a Dataframe that had extra columns, the columns not in the case class were dropped from the Dataset.
For example in 1.6, the column c4 is gone: scala> case class F(f1: String, f2: String, f3:String) defined class F scala> import sqlContext.implicits._ import sqlContext.implicits._ scala> val df = Seq(("a","b","c","x"), ("d", "e", "f","y"), ("h", "i", "j","z")).toDF("f1", "f2", "f3", "c4") df: org.apache.spark.sql.DataFrame = [f1: string, f2: string, f3: string, c4: string] scala> val ds = df.as[F] ds: org.apache.spark.sql.Dataset[F] = [f1: string, f2: string, f3: string] scala> ds.show +---+---+---+ | f1| f2| f3| +---+---+---+ | a| b| c| | d| e| f| | h| i| j| This seems to have changed in Spark 2.0 and also 2.1: Spark 2.1.0: scala> case class F(f1: String, f2: String, f3:String) defined class F scala> import spark.implicits._ import spark.implicits._ scala> val df = Seq(("a","b","c","x"), ("d", "e", "f","y"), ("h", "i", "j","z")).toDF("f1", "f2", "f3", "c4") df: org.apache.spark.sql.DataFrame = [f1: string, f2: string ... 2 more fields] scala> val ds = df.as[F] ds: org.apache.spark.sql.Dataset[F] = [f1: string, f2: string ... 2 more fields] scala> ds.show +---+---+---+---+ | f1| f2| f3| c4| +---+---+---+---+ | a| b| c| x| | d| e| f| y| | h| i| j| z| +---+---+---+---+ Is there a way to get a Dataset that conforms to the case class in Spark 2.1.0? Basically, I'm attempting to use the case class to define an output schema, and these extra columns are getting in the way. Thanks. -Don -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ https://twitter.com/dondrake <http://www.MailLaunder.com/> 800-733-2143