rdtr commented on issue #55575: URL: https://github.com/apache/spark/issues/55575#issuecomment-4537386547
I added a unit test in `HiveSQLViewSuite` (on `master`) but could not reproduce the issue. Here are the plans from the test run: **Analyzed plan:** ``` 00 WithCTE 01 :- CTERelationDef 0, false 02 : +- SubqueryAlias cte 03 : +- Project [id#35, name#36, (value#37 * 2) AS doubled#34] 04 : +- SubqueryAlias spark_catalog.default.my_view 05 : +- View (`spark_catalog`.`default`.`my_view`, [id#35, name#36, value#37]) 06 : +- Project [cast(id#26 as int) AS id#35, cast(name#27 as string) AS name#36, cast(value#28 as int) AS value#37] 07 : +- Union false, false 08 : :- Project [id#26, name#27, value#28] 09 : : +- SubqueryAlias spark_catalog.default.t1 10 : : +- Relation spark_catalog.default.t1[id#26,name#27,value#28] parquet 11 : +- Project [id#29, name#30, value#31] 12 : +- SubqueryAlias spark_catalog.default.t2 13 : +- Relation spark_catalog.default.t2[id#29,name#30,value#31] parquet 14 +- Project [id#35, name#36, doubled#34] 15 +- Filter (name#36 = foo) 16 +- SubqueryAlias cte 17 +- CTERelationRef 0, true, [id#35, name#36, doubled#34], false, false ``` **Optimized plan:** ``` 00 Union false, false 01 :- Project [id#26 AS id#35, name#27 AS name#36, (value#28 * 2) AS doubled#34] 02 : +- Filter (isnotnull(name#27) AND (name#27 = foo)) 03 : +- Relation spark_catalog.default.t1[id#26,name#27,value#28] parquet 04 +- Project [id#29, name#30, (value#31 * 2) AS doubled#41] 05 +- Filter (isnotnull(name#30) AND (name#30 = foo)) 06 +- Relation spark_catalog.default.t2[id#29,name#30,value#31] parquet ``` Line 06 in the analyzed plan shows every View column is wrapped with `cast(...)` + `Alias`, creating new exprIds (#35, #36, #37). This means `name` is never a passthrough column. It's in the `aliasMap`, so `replaceAlias` correctly maps `name#36 → cast(name#27)` → `name#27` when pushing through the View's Project. The exprId mismatch described in the issue never occurs. I also tested with `CREATE VIEW my_view WITH SCHEMA EVOLUTION ...`, which skips the cast Project entirely. In that case the View passes the Union's output directly (`name#27` flows through unchanged), so the exprIds are already consistent — no mismatch either. Both `master` and `branch-3.5` have the same `UpCast` + `Alias` wrapping in `SessionCatalog.fromCatalogTable`. Could you confirm whether this reproduces on upstream Spark 3.5.5 (not the AMZ fork)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
