Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21291#discussion_r188878956 --- Diff: python/pyspark/sql/tests.py --- @@ -5239,8 +5239,8 @@ def test_complex_groupby(self): expected2 = df.groupby().agg(sum(df.v)) # groupby one column and one sql expression - result3 = df.groupby(df.id, df.v % 2).agg(sum_udf(df.v)) - expected3 = df.groupby(df.id, df.v % 2).agg(sum(df.v)) + result3 = df.groupby(df.id, df.v % 2).agg(sum_udf(df.v)).orderBy(df.id, df.v % 2) --- End diff -- They are already ordered by `df.id`. This is the partial data: ``` Expected: id (v % 2) sum(v) 0 0 0.0 120.0 1 0 1.0 125.0 2 1 1.0 125.0 3 1 0.0 130.0 4 2 0.0 130.0 5 2 1.0 135.0 ``` ``` Result: id (v % 2) sum(v) 0 0 0.0 120.0 1 0 1.0 125.0 2 1 0.0 130.0 3 1 1.0 125.0 4 2 0.0 130.0 5 2 1.0 135.0 ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org