Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20521#discussion_r166413015 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -493,9 +510,23 @@ case class DataSource( dataSource.createRelation( sparkSession.sqlContext, mode, caseInsensitiveOptions, Dataset.ofRows(sparkSession, data)) case format: FileFormat => - sparkSession.sessionState.executePlan(planForWritingFileFormat(format, mode, data)).toRdd + val cmd = planForWritingFileFormat(format, mode, data) + val resolvedPartCols = cmd.partitionColumns.map { col => + // The partition columns created in `planForWritingFileFormat` should always be + // `UnresolvedAttribute` with a single name part. + assert(col.isInstanceOf[UnresolvedAttribute]) + val unresolved = col.asInstanceOf[UnresolvedAttribute] + assert(unresolved.nameParts.length == 1) + val name = unresolved.nameParts.head + outputColumns.find(a => equality(a.name, name)).getOrElse { + throw new AnalysisException( + s"Unable to resolve $name given [${data.output.map(_.name).mkString(", ")}]") + } + } + val resolved = cmd.copy(partitionColumns = resolvedPartCols, outputColumns = outputColumns) --- End diff -- > Why does the physical plan not match the command that is produced It matches! The only problem is, they are 2 different JVM objects. The UI keeps the physical plan object and displays them. An alternative solution is to swap the new physical plan into the UI part, but that's hard to do with the current UI framework. If we run `sparkSession.sessionState.executePlan(planForWritingFileFormat(format, mode, data)).toRdd`, we are executing the new physical plan, so no metrics will be reported to the passed-in physical plan and shown in the UI.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org