Sunitha Kambhampati created SPARK-27692: -------------------------------------------
Summary: Optimize evaluation of udf that is deterministic and has literal inputs Key: SPARK-27692 URL: https://issues.apache.org/jira/browse/SPARK-27692 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Sunitha Kambhampati Deterministic UDF is a udf for which the following is true: Given a specific input, the output of the udf will be the same no matter how many times you execute the udf. When your inputs to the UDF are all literal and UDF is deterministic, we can optimize this to evaluate the udf once and use the output instead of evaluating the UDF each time for every row in the query. This is valid only if the UDF is deterministic and inputs are literal. Otherwise we should not and cannot apply this optimization. *Testing:* We have used this internally and have seen significant performance improvements for some very expensive UDFs ( as expected). In the PR, I have added unit tests. *Credits:* Thanks to Guy Khazma([https://github.com/guykhazma]) from the IBM Haifa Research Team for the idea and the original implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org