[ https://issues.apache.org/jira/browse/SPARK-33303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-33303: ------------------------------------ Assignee: Peter Toth (was: Apache Spark) > Deduplicate deterministic PythonUDF calls > ----------------------------------------- > > Key: SPARK-33303 > URL: https://issues.apache.org/jira/browse/SPARK-33303 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Peter Toth > Assignee: Peter Toth > Priority: Major > Fix For: 3.1.0 > > > We run into an issue where a customer created a column with an expensive > PythonUDF call and build a very complex logic on the the top of that column > as new derived columns. Due to `CollapseProject` and `ExtractPythonUDFs` > rules the UDF is called ~1000 times for each row which degraded the > performance of the query significantly. > The `ExtractPythonUDFs` rule could deduplicate deterministic UDFs so as to > avoid performance degradation. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org