rdtr commented on issue #55575:
URL: https://github.com/apache/spark/issues/55575#issuecomment-4537386547

   I added a unit test in `HiveSQLViewSuite` (on `master`) but could not 
reproduce the issue.
   
   Here are the plans from the test run:
   
   **Analyzed plan:**
   ```
   00 WithCTE
   01 :- CTERelationDef 0, false
   02 :  +- SubqueryAlias cte
   03 :     +- Project [id#35, name#36, (value#37 * 2) AS doubled#34]
   04 :        +- SubqueryAlias spark_catalog.default.my_view
   05 :           +- View (`spark_catalog`.`default`.`my_view`, [id#35, 
name#36, value#37])
   06 :              +- Project [cast(id#26 as int) AS id#35, cast(name#27 as 
string) AS name#36, cast(value#28 as int) AS value#37]
   07 :                 +- Union false, false
   08 :                    :- Project [id#26, name#27, value#28]
   09 :                    :  +- SubqueryAlias spark_catalog.default.t1
   10 :                    :     +- Relation 
spark_catalog.default.t1[id#26,name#27,value#28] parquet
   11 :                    +- Project [id#29, name#30, value#31]
   12 :                       +- SubqueryAlias spark_catalog.default.t2
   13 :                          +- Relation 
spark_catalog.default.t2[id#29,name#30,value#31] parquet
   14 +- Project [id#35, name#36, doubled#34]
   15    +- Filter (name#36 = foo)
   16       +- SubqueryAlias cte
   17          +- CTERelationRef 0, true, [id#35, name#36, doubled#34], false, 
false
   ```
   
   **Optimized plan:**
   ```
   00 Union false, false
   01 :- Project [id#26 AS id#35, name#27 AS name#36, (value#28 * 2) AS 
doubled#34]
   02 :  +- Filter (isnotnull(name#27) AND (name#27 = foo))
   03 :     +- Relation spark_catalog.default.t1[id#26,name#27,value#28] parquet
   04 +- Project [id#29, name#30, (value#31 * 2) AS doubled#41]
   05    +- Filter (isnotnull(name#30) AND (name#30 = foo))
   06       +- Relation spark_catalog.default.t2[id#29,name#30,value#31] parquet
   ```
   
   Line 06 in the analyzed plan shows every View column is wrapped with 
`cast(...)` + `Alias`, creating new exprIds (#35, #36, #37). This means `name` 
is never a passthrough column. It's in the `aliasMap`, so `replaceAlias` 
correctly maps `name#36 → cast(name#27)` → `name#27` when pushing through the 
View's Project. The exprId mismatch described in the issue never occurs.
   
   I also tested with `CREATE VIEW my_view WITH SCHEMA EVOLUTION ...`, which 
skips the cast Project entirely. In that case the View passes the Union's 
output directly (`name#27` flows through unchanged), so the exprIds are already 
consistent — no mismatch either.
   
   Both `master` and `branch-3.5` have the same `UpCast` + `Alias` wrapping in 
`SessionCatalog.fromCatalogTable`. Could you confirm whether this reproduces on 
upstream Spark 3.5.5 (not the AMZ fork)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to