[PR] [SPARK-54419][SQL] Avoid expanding expensive alias chains in optimizer [spark]

via GitHub Mon, 04 May 2026 01:07:03 -0700


201573 opened a new pull request, #55666:
URL: https://github.com/apache/spark/pull/55666


   ### What changes were proposed in this pull request?
   
   This PR avoids eagerly expanding expensive projection aliases during 
predicate pushdown, and prevents `CollapseProject` from force-inlining 
multi-use expensive Python-UDF-containing aliases just to merge adjacent Python 
UDF projections.
   
   It also adds a regression test for the deep `withColumn` rewrite pattern 
reported in SPARK-54419.
   
   ### Why are the changes needed?
   
   The optimizer could blow up on deep iterative `withColumn` rewrites when a 
filter above the projection chain referenced an expensive alias. We were 
expanding those aliases before deciding whether the predicate could stay above 
the project, and then `CollapseProject` could still force-inline the expensive 
chain while merging Python UDF projections.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Verified locally:
   - `./build/sbt "catalyst/testOnly 
org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite 
org.apache.spark.sql.catalyst.optimizer.CollapseProjectSuite"`
   - `git diff --check`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54419][SQL] Avoid expanding expensive alias chains in optimizer [spark]

Reply via email to