[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899781#comment-16899781 ]
Dongjoon Hyun commented on SPARK-28422: --------------------------------------- Thank you for reporting, [~icexelloss]. And, thank you for making a PR, [~viirya]. Since this is not supported from 2.4.0, I updated the affected versions, too. > GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause > ----------------------------------------------------------------------- > > Key: SPARK-28422 > URL: https://issues.apache.org/jira/browse/SPARK-28422 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 > Reporter: Li Jin > Priority: Major > > > {code:python} > from pyspark.sql.functions import pandas_udf, PandasUDFType > @pandas_udf('double', PandasUDFType.GROUPED_AGG) > def max_udf(v): > return v.max() > df = spark.range(0, 100) > spark.udf.register('max_udf', max_udf) > df.createTempView('table') > # A. This works > df.agg(max_udf(df['id'])).show() > # B. This doesn't work > spark.sql("select max_udf(id) from table").show(){code} > > > Query plan: > A: > {code:java} > == Parsed Logical Plan == > 'Aggregate [max_udf('id) AS max_udf(id)#140] > +- Range (0, 1000, step=1, splits=Some(4)) > == Analyzed Logical Plan == > max_udf(id): double > Aggregate [max_udf(id#64L) AS max_udf(id)#140] > +- Range (0, 1000, step=1, splits=Some(4)) > == Optimized Logical Plan == > Aggregate [max_udf(id#64L) AS max_udf(id)#140] > +- Range (0, 1000, step=1, splits=Some(4)) > == Physical Plan == > !AggregateInPandas [max_udf(id#64L)], [max_udf(id)#138 AS max_udf(id)#140] > +- Exchange SinglePartition > +- *(1) Range (0, 1000, step=1, splits=4) > {code} > B: > {code:java} > == Parsed Logical Plan == > 'Project [unresolvedalias('max_udf('id), None)] > +- 'UnresolvedRelation [table] > == Analyzed Logical Plan == > max_udf(id): double > Project [max_udf(id#0L) AS max_udf(id)#136] > +- SubqueryAlias `table` > +- Range (0, 100, step=1, splits=Some(4)) > == Optimized Logical Plan == > Project [max_udf(id#0L) AS max_udf(id)#136] > +- Range (0, 100, step=1, splits=Some(4)) > == Physical Plan == > *(1) Project [max_udf(id#0L) AS max_udf(id)#136] > +- *(1) Range (0, 100, step=1, splits=4) > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org