shrirangmhalgi opened a new pull request, #56083: URL: https://github.com/apache/spark/pull/56083
### What changes were proposed in this pull request? Normalize CTE IDs of orphan `CTERelationRef` nodes in `NormalizeCTEIds`. Previously, only `CTERelationRef` nodes inside `WithCTE` were normalized via `canonicalizeCTE`. Refs that exist outside any `WithCTE` (orphans) kept their original IDs. ### Why are the changes needed? After `InlineCTE` or `MergeSubplans` runs, some `CTERelationRef` nodes can end up outside their parent `WithCTE` node. When `NormalizeCTEIds` processes the plan, these orphan refs are skipped, leaving non-normalized IDs. This breaks plan comparison and caching because the same logical plan gets different CTE IDs across sessions (since `CTERelationDef` uses a global monotonically increasing counter). ### Does this PR introduce _any_ user-facing change? No. This is an internal plan normalization fix that affects plan caching correctness. ### How was this patch tested? Added `NormalizeCTEIdsSuite` with a test that constructs a plan with a `CTERelationRef` outside `WithCTE` and verifies all ref IDs are normalized. Without the fix, the orphan ref retains its original ID (100); with the fix, it's normalized to 0. ### Was this patch authored or co-authored using generative AI tooling? Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
