[ https://issues.apache.org/jira/browse/SPARK-25313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25313: ------------------------------------ Assignee: Apache Spark > Fix regression in FileFormatWriter output schema > ------------------------------------------------ > > Key: SPARK-25313 > URL: https://issues.apache.org/jira/browse/SPARK-25313 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Gengliang Wang > Assignee: Apache Spark > Priority: Major > > In the follow example: > val location = "/tmp/t" > val df = spark.range(10).toDF("id") > df.write.format("parquet").saveAsTable("tbl") > spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl") > spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location > $location") > spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1") > println(spark.read.parquet(location).schema) > spark.table("tbl2").show() > The output column name in schema will be id instead of ID, thus the last > query shows nothing from tbl2. > By enabling the debug message we can see that the output naming is changed > from ID to id, and then the outputColumns in > InsertIntoHadoopFsRelationCommand is changed in RemoveRedundantAliases. > To guarantee correctness, we should change the output columns from > `Seq[Attribute]` to `Seq[String]` to avoid its names being replaced by > optimizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org