201573 opened a new pull request, #55666: URL: https://github.com/apache/spark/pull/55666
### What changes were proposed in this pull request? This PR avoids eagerly expanding expensive projection aliases during predicate pushdown, and prevents `CollapseProject` from force-inlining multi-use expensive Python-UDF-containing aliases just to merge adjacent Python UDF projections. It also adds a regression test for the deep `withColumn` rewrite pattern reported in SPARK-54419. ### Why are the changes needed? The optimizer could blow up on deep iterative `withColumn` rewrites when a filter above the projection chain referenced an expensive alias. We were expanding those aliases before deciding whether the predicate could stay above the project, and then `CollapseProject` could still force-inline the expensive chain while merging Python UDF projections. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Verified locally: - `./build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite org.apache.spark.sql.catalyst.optimizer.CollapseProjectSuite"` - `git diff --check` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
