Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20521#discussion_r166386827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -482,9 +484,24 @@ case class DataSource( /** * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for * the following reading. + * + * @param mode The save mode for this writing. + * @param data The input query plan that produces the data to be written. Note that this plan + * is analyzed and optimized. + * @param outputColumns The original output columns of the input query plan. The optimizer may not + * preserve the output column's names' case, so we need this parameter + * instead of `data.output`. + * @param physicalPlan The physical plan of the input query plan. We should run the writing + * command with this physical plan instead of creating a new physical plan, + * so that the metrics can be correctly linked to the given physical plan and + * shown in the web UI. --- End diff -- Generally I think it's hacky to analyze/optimize/plan/execute a query during the execution of another query. Not only CTAS, other commands like `CreateView`, `CacheTable` etc. also have this issue. This is a surgical fix for Spark 2.3, so I didn't change this part and leave it for 2.4.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org