peter-toth opened a new pull request, #55644:
URL: https://github.com/apache/spark/pull/55644

   ### What changes were proposed in this pull request?
   
   `DynamicPruningSubquery.canonicalized` now normalizes `buildKeys` relative 
to `buildQuery.output` using `QueryPlan.normalizeExpressions` instead of 
calling `.canonicalized` on each key expression independently.
   
   ### Why are the changes needed?
   
   The previous implementation called `buildKeys.map(_.canonicalized)`, which 
canonicalized each key expression in isolation and therefore preserved the 
original `ExprId` values of attribute references. When two 
`DynamicPruningSubquery` instances referenced the same logical build query 
(e.g. different copies of a CTE branch) but with different `ExprId`s, their 
canonical `buildKeys` differed even though the queries were semantically 
identical.
   
   `QueryPlan.normalizeExpressions(key, buildQuery.output)` replaces each 
attribute reference with `ExprId(ordinal)` where `ordinal` is the attribute's 
position in `buildQuery.output`. Two copies of the same CTE branch will place 
the same attribute at the same ordinal, so the canonical `buildKeys` become 
identical regardless of the original `ExprId` values.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This is an internal canonicalization fix. It may improve query plans by 
enabling `PlanMerger` to deduplicate more `DynamicPruningSubquery` expressions, 
but does not change observable query results.
   
   ### How was this patch tested?
   
   Added a unit test in `DynamicPruningSubquerySuite` that constructs two 
`DynamicPruningSubquery` instances with identical build query structure but 
fresh (distinct) `ExprId`s, and asserts that their `canonicalized` forms are 
equal.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Sonnet 4.6
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to