GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/22320
[SPARK-25313][SQL]Fix regression in FileFormatWriter output names ## What changes were proposed in this pull request? Let's see the follow example: ``` val location = "/tmp/t" val df = spark.range(10).toDF("id") df.write.format("parquet").saveAsTable("tbl") spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location $location") spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") println(spark.read.parquet(location).schema) spark.table("tbl2").show() ``` The output column name in schema will be `id` instead of `ID`, thus the last query shows nothing from `tbl2`. By enabling the debug message we can see that the output naming is changed from `ID` to `id`, and then the `outputColumns` in `InsertIntoHadoopFsRelationCommand` is changed in `RemoveRedundantAliases`. ![wechatimg5](https://user-images.githubusercontent.com/1097932/44947871-6299f200-ae46-11e8-9c96-d45fe368206c.jpeg) ![wechatimg4](https://user-images.githubusercontent.com/1097932/44947866-56ae3000-ae46-11e8-8923-8b3bbe060075.jpeg) **To guarantee correctness**, we should change the output columns from `Seq[Attribute]` to `Seq[String]` to avoid its names being replaced by optimizer. I will fix project elimination related rules in https://github.com/apache/spark/pull/22311 after this one. ## How was this patch tested? Unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark fixOutputSchema Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22320.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22320 ---- commit bbd572c1fe542c6b2fd642212f927ba384c882e4 Author: Gengliang Wang <gengliang.wang@...> Date: 2018-08-31T16:07:00Z Fix regression in FileFormatWriter output schema ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org