[
https://issues.apache.org/jira/browse/SPARK-42576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-42576.
---------------------------------
> Add 2nd groupBy method to Dataset
> ---------------------------------
>
> Key: SPARK-42576
> URL: https://issues.apache.org/jira/browse/SPARK-42576
> Project: Spark
> Issue Type: New Feature
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Herman van Hövell
> Assignee: Rui Wang
> Priority: Major
> Fix For: 3.4.1
>
>
> Dataset is missing a groupBy method:
> {code:java}
> /**
> * Groups the Dataset using the specified columns, so that we can run
> aggregation on them.
> * See [[RelationalGroupedDataset]] for all the available aggregate functions.
> *
> * This is a variant of groupBy that can only group by existing columns using
> column names
> * (i.e. cannot construct expressions).
> *
> * {{{
> * // Compute the average for all numeric columns grouped by department.
> * ds.groupBy("department").avg()
> *
> * // Compute the max age and average salary, grouped by department and
> gender.
> * ds.groupBy($"department", $"gender").agg(Map(
> * "salary" -> "avg",
> * "age" -> "max"
> * ))
> * }}}
> * @group untypedrel
> * @since 3.4.0
> */
> @scala.annotation.varargs
> def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]