skonto commented on a change in pull request #25215: [SPARK-28445][SQL][Python] Fix error when PythonUDF is used in both group by and aggregate expression URL: https://github.com/apache/spark/pull/25215#discussion_r305753415
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala ########## @@ -81,6 +81,64 @@ object ExtractPythonUDFFromAggregate extends Rule[LogicalPlan] { } } +/** + * Extracts PythonUDFs in logical aggregate, which are used in grouping keys, evaluate them + * before aggregate. + * This must be executed after `ExtractPythonUDFFromAggregate` rule and before `ExtractPythonUDFs`. Review comment: Could you add the comment at the Spark Optimizer side? I think it would be helpful in case of a refactoring. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org