[ https://issues.apache.org/jira/browse/SPARK-41743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652934#comment-17652934 ]
Martin Grund edited comment on SPARK-41743 at 12/29/22 8:24 PM: ---------------------------------------------------------------- In the following example, I cannot reproduce this: {code:python} df = spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": "Anton"}]) df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show() {code} produces {noformat} +------+--------+ | name|min(age)| +------+--------+ | Anton| 11| |Martin| 10| +------+--------+ {noformat} vs {code:python} df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show() {code} produces {noformat} +------+--------+ | name|min(age)| +------+--------+ |Martin| 10| | Anton| 11| +------+--------+ {noformat} was (Author: JIRAUSER290467): In the following example, I cannot reproduce this: ``` df = spark.createDataFrame([{"age":10, "name": "Martin"},{"age":11, "name": "Anton"}]) df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).sort("name").show() ``` produces ``` +------+--------+ | name|min(age)| +------+--------+ | Anton| 11| |Martin| 10| +------+--------+ ``` vs ``` df.select(df["_1"].alias("age"), df["_2"].alias("name")).groupBy("name").agg({"age":"min"}).show() ``` produces ``` +------+--------+ | name|min(age)| +------+--------+ |Martin| 10| | Anton| 11| +------+--------+ ``` > groupBy(...).agg(...).sort does not actually sort the output > ------------------------------------------------------------ > > Key: SPARK-41743 > URL: https://issues.apache.org/jira/browse/SPARK-41743 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: Hyukjin Kwon > Priority: Major > > {code} > ********************************************************************** > File "/.../spark/python/pyspark/sql/connect/group.py", line 211, in > pyspark.sql.connect.group.GroupedData.agg > Failed example: > df.groupBy(df.name).agg(F.min(df.age)).sort("name").show() > Differences (ndiff with -expected +actual): > +-----+--------+ > | name|min(age)| > +-----+--------+ > + | Bob| 5| > |Alice| 2| > - | Bob| 5| > +-----+--------+ > + <BLANKLINE> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org