[ https://issues.apache.org/jira/browse/SPARK-31945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-31945: ------------------------------------ Assignee: Takuya Ueshin > Make more cache enable for Python UDFs. > --------------------------------------- > > Key: SPARK-31945 > URL: https://issues.apache.org/jira/browse/SPARK-31945 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.0.0 > Reporter: Takuya Ueshin > Assignee: Takuya Ueshin > Priority: Major > > Currently the cache manager doesn't use the cache for {{udf}} if the {{udf}} > is created again even if the functions is the same. > {code:python} > >>> func = lambda x: x > >>> df = spark.range(1) > >>> df.select(udf(func)("id")).cache() > >>> df.select(udf(func)("id")).explain() > == Physical Plan == > *(2) Project [pythonUDF0#14 AS <lambda>(id)#12] > +- BatchEvalPython [<lambda>(id#0L)], [pythonUDF0#14] > +- *(1) Range (0, 1, step=1, splits=12) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org