Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20521#discussion_r166386827
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
    @@ -482,9 +484,24 @@ case class DataSource(
       /**
        * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
        * the following reading.
    +   *
    +   * @param mode The save mode for this writing.
    +   * @param data The input query plan that produces the data to be 
written. Note that this plan
    +   *             is analyzed and optimized.
    +   * @param outputColumns The original output columns of the input query 
plan. The optimizer may not
    +   *                      preserve the output column's names' case, so we 
need this parameter
    +   *                      instead of `data.output`.
    +   * @param physicalPlan The physical plan of the input query plan. We 
should run the writing
    +   *                     command with this physical plan instead of 
creating a new physical plan,
    +   *                     so that the metrics can be correctly linked to 
the given physical plan and
    +   *                     shown in the web UI.
    --- End diff --
    
    Generally I think it's hacky to analyze/optimize/plan/execute a query 
during the execution of another query. Not only CTAS, other commands like 
`CreateView`, `CacheTable` etc. also have this issue. This is a surgical fix 
for Spark 2.3, so I didn't change this part and leave it for 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to