We observed the following bug on Spark 2.4.0: scala> spark.createDataset(Seq((1,2))).write.partitionBy("_1").parquet("foo.parquet")
scala> val schema = StructType(Seq(StructField("_1", IntegerType),StructField("_2", IntegerType))) scala> spark.read.schema(schema).parquet("foo.parquet").as[(Int, Int)].show +---+---+ | _2| _1| +---+---+ | 2| 1| +---+- --+ That is, when reading column partitioned Parquet files the explicitly specified schema is not adhered to, instead the partitioning columns are appended the end of the column list. This is a quite severe issue as some operations, such as union, fails if columns are in a different order in two datasets. Thus we have to work around the issue with a select: val columnNames = schema.fields.map(_.name) ds.select(columnNames.head, columnNames.tail: _*) Thanks, David Szakallas Data Engineer | Whitepages, Inc.