[ https://issues.apache.org/jira/browse/SPARK-41742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruifeng Zheng resolved SPARK-41742. ----------------------------------- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39298 [https://github.com/apache/spark/pull/39298] > Support star in groupBy.agg() > ----------------------------- > > Key: SPARK-41742 > URL: https://issues.apache.org/jira/browse/SPARK-41742 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: Hyukjin Kwon > Priority: Major > Fix For: 3.4.0 > > > Doctest in {{pyspark.sql.connect.group.GroupedData.agg}} fails with the error > below: > {code} > Failed example: > df.groupBy(df.name).agg({"*": "count"}).sort("name").show() > Exception raised: > Traceback (most recent call last): > File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line > 1336, in __run > exec(compile(example.source, filename, "single", > File "<doctest pyspark.sql.connect.group.GroupedData.agg[4]>", line 1, > in <module> > df.groupBy(df.name).agg({"*": "count"}).sort("name").show() > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 538, in > show > print(self._show_string(n, truncate, vertical)) > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 424, in > _show_string > pdf = DataFrame.withPlan( > File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 895, in > toPandas > return self._session.client._to_pandas(query) > File "/.../spark/python/pyspark/sql/connect/client.py", line 333, in > _to_pandas > return self._execute_and_fetch(req) > File "/.../spark/python/pyspark/sql/connect/client.py", line 421, in > _execute_and_fetch > for b in self._stub.ExecutePlan(req, > metadata=self._builder.metadata()): > File > "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/grpc/_channel.py", > line 426, in __next__ > return self._next() > File > "/.../miniconda3/envs/python3.9/lib/python3.9/site-packages/grpc/_channel.py", > line 826, in _next > raise self > grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC > that terminated with: > status = StatusCode.UNKNOWN > details = "[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function > parameter with name `*` cannot be resolved. Did you mean one of the > following? [`age`, `name`]; > 'Sort ['name DESC NULLS LAST], true > +- 'Aggregate [name#26], [name#26, unresolvedalias('count('*), None)] > +- Project [0#21L AS age#25L, 1#22 AS name#26] > +- LocalRelation [0#21L, 1#22] > " > debug_error_string = "UNKNOWN:Error received from peer > ipv6:%5B::1%5D:15002 {created_time:"2022-12-28T20:55:38.30791+09:00", > grpc_status:2, grpc_message:"[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or > function parameter with name `*` cannot be resolved. Did you mean one of the > following? [`age`, `name`];\n\'Sort [\'name DESC NULLS LAST], true\n+- > \'Aggregate [name#26], [name#26, unresolvedalias(\'count(\'*), None)]\n +- > Project [0#21L AS age#25L, 1#22 AS name#26]\n +- LocalRelation [0#21L, > 1#22]\n"}" > {code} > We should enable this back after fixing the issue in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org