Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21184#discussion_r201961786 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -284,6 +288,80 @@ class Analyzer( } } + /** + * Replaces [[Alias]] with the same exprId but different references with [[Alias]] having + * different exprIds. This is a rare situation which can cause incorrect results. + */ + object DeduplicateAliases extends Rule[LogicalPlan] { --- End diff -- The main problem which causes the added UT to fail is that `FoldablePropagation` replaces all foldable aliases which are considered to be the same. If the same alias with same exprId is located in different part of the plan (referencing actually different things, despite they have the same id...) this can cause wrong replacement to happen. So in the added UT, the plan is: ``` == Analyzed Logical Plan == a: int, b: int, n: bigint Union :- Project [a#5, b#17, n#19L] : +- Project [a#5, b#17, n#19L, n#19L] : +- Window [count(1) windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS n#19L] : +- Project [a#5, b#6 AS b#17] : +- Project [_1#2 AS a#5, _2#3 AS b#6] : +- LocalRelation [_1#2, _2#3] +- Project [a#12, b#17, n#19L] +- Project [a#12, b#17, n#19L, n#19L] +- Window [count(1) windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS n#19L] +- Project [a#12, b#14 AS b#17] +- Project [a#12, 0 AS b#14] +- Project [value#10 AS a#12] +- LocalRelation [value#10] ``` Please note that in both the branches of the union we have the same `b#17` attribute, but they are referencing different things. As the lower one is a foldable value which evaluates to 0, all the `b#17` are replace with a literal 0, causing a wrong result. Despite we might fix this specific problem in the related Optimizer rule, I think that in general we assume that items with the same id are the same. So I proposed this solution in order to fix all the possible issues which may arise due to this situation which is not expected.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org