Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22124 The root project should be consistent with the schema of the target table. But now it is inconsistent. **Before this PR**: [dataColumns](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L84): `col1#8L,col2#9L` [plan](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L67): ``` *(1) Project [col1#8L, col2#9L] +- *(1) Filter (isnotnull(col1#8L) && (col1#8L > -20)) +- *(1) FileScan parquet default.table1[col1#8L,col2#9L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet], PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)], ReadSchema: struct<col1:bigint,col2:bigint> ``` **After this PR**: [dataColumns](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L84): `COL1#14L,COL2#15L` [plan](https://github.com/apache/spark/blob/e6c6f90a55241905c420afbc803dd3bd6961d66b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L67): ``` *(1) Project [col1#8L AS COL1#14L, col2#9L AS COL2#15L] +- *(1) Filter (isnotnull(col1#8L) && (col1#8L > -20)) +- *(1) FileScan parquet default.table1[col1#8L,col2#9L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet], PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)], ReadSchema: struct<col1:bigint,col2:bigint> ``` Before [SPARK-22834](https://issues.apache.org/jira/browse/SPARK-22834) [dataColumns](https://github.com/apache/spark/blob/ec122209fb35a65637df42eded64b0203e105aae/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L124): `COL1#19L,COL2#20L` [queryExecution](https://github.com/apache/spark/blob/ec122209fb35a65637df42eded64b0203e105aae/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L104): ``` == Parsed Logical Plan == Project [COL1#19L, COL2#20L] +- SubqueryAlias view1 +- View (`default`.`view1`, [col1#19L,col2#20L]) +- Project [col1#15L, col2#16L] +- Filter (col1#15L > cast(-20 as bigint)) +- SubqueryAlias table1 +- Relation[col1#15L,col2#16L] parquet == Analyzed Logical Plan == COL1: bigint, COL2: bigint Project [COL1#19L, COL2#20L] +- SubqueryAlias view1 +- View (`default`.`view1`, [col1#19L,col2#20L]) +- Project [cast(col1#15L as bigint) AS col1#19L, cast(col2#16L as bigint) AS col2#20L] +- Project [col1#15L, col2#16L] +- Filter (col1#15L > cast(-20 as bigint)) +- SubqueryAlias table1 +- Relation[col1#15L,col2#16L] parquet == Optimized Logical Plan == Filter (isnotnull(col1#15L) && (col1#15L > -20)) +- Relation[col1#15L,col2#16L] parquet == Physical Plan == *Project [col1#15L, col2#16L] +- *Filter (isnotnull(col1#15L) && (col1#15L > -20)) +- *FileScan parquet default.table1[col1#15L,col2#16L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/yumwang/spark/parquet], PartitionFilters: [], PushedFilters: [IsNotNull(col1), GreaterThan(col1,-20)], ReadSchema: struct<col1:bigint,col2:bigint> ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org