srowen commented on a change in pull request #30203: URL: https://github.com/apache/spark/pull/30203#discussion_r516730762
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ########## @@ -218,13 +218,22 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] with PredicateHelper { } } + private def canonicalizeDeterministic(u: PythonUDF) = { Review comment: Indeed, the example in the PR should have c and d equal no matter what, ideally, as it shouldn't be evaluated multiple times. But then, being deterministic doesn't matter; how would changing the default be the right fix? I don't doubt it should be 'fixed' so that even non-deterministic UDFs aren't surprisingly reevaluated. Is the point that the current handling for non-deterministic UDFs slower, even though it evaluates them just once by design? that seems weird. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org