[ https://issues.apache.org/jira/browse/SPARK-27761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858155#comment-16858155 ]
Josh Rosen commented on SPARK-27761: ------------------------------------ FYI, I'm marking SPARK-27969 as a blocker to this because non-deterministic expressions can unnecessarily prevent scan-time column pruning: as a result, a change of default could lead to massive performance regressions when users upgrade to 3.0. > Make UDF nondeterministic by default(?) > --------------------------------------- > > Key: SPARK-27761 > URL: https://issues.apache.org/jira/browse/SPARK-27761 > Project: Spark > Issue Type: Brainstorming > Components: SQL > Affects Versions: 3.0.0 > Reporter: Sunitha Kambhampati > Priority: Minor > > Opening this issue as a followup from a discussion/question on this PR for an > optimization involving deterministic udf: > https://github.com/apache/spark/pull/24593#pullrequestreview-237361795 > "We even should discuss whether all UDFs must be deterministic or > non-deterministic by default." > Basically today in Spark 2.4, Scala UDFs are marked deterministic by default > and it is implicit. To mark a udf as non deterministic, they need to call > this method asNondeterministic(). > The concern's expressed are that users are not aware of this property and its > implications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org