[ https://issues.apache.org/jira/browse/SPARK-39962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-39962: ------------------------------------ Assignee: Hyukjin Kwon > Global aggregation against pandas aggregate UDF does not take the column > order into account > ------------------------------------------------------------------------------------------- > > Key: SPARK-39962 > URL: https://issues.apache.org/jira/browse/SPARK-39962 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.1.3, 3.3.0, 3.2.2, 3.4.0 > Reporter: Hyukjin Kwon > Assignee: Hyukjin Kwon > Priority: Major > > {code} > import pandas as pd > from pyspark.sql import functions as f > @f.pandas_udf("double") > def AVG(x: pd.Series) -> float: > return x.mean() > abc = spark.createDataFrame([(1.0, 5.0, 17.0)], schema=["a", "b", "c"]) > abc.agg(AVG("a"), AVG("c")).show() > abc.select("c", "a").agg(AVG("a"), AVG("c")).show() > {code} > {code} > +------+------+ > |AVG(a)|AVG(c)| > +------+------+ > | 1.0| 17.0| > +------+------+ > +------+------+ > |AVG(a)|AVG(c)| > +------+------+ > | 17.0| 1.0| > +------+------+ > {code} > Both have to be the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org