[ https://issues.apache.org/jira/browse/SPARK-42576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693406#comment-17693406 ]
Herman van Hövell commented on SPARK-42576: ------------------------------------------- cc [~ruiwcdl] > Add 2nd groupBy method to Dataset > --------------------------------- > > Key: SPARK-42576 > URL: https://issues.apache.org/jira/browse/SPARK-42576 > Project: Spark > Issue Type: New Feature > Components: Connect > Affects Versions: 3.4.0 > Reporter: Herman van Hövell > Priority: Major > > Dataset is missing a groupBy method: > {code:java} > /** > * Groups the Dataset using the specified columns, so that we can run > aggregation on them. > * See [[RelationalGroupedDataset]] for all the available aggregate functions. > * > * This is a variant of groupBy that can only group by existing columns using > column names > * (i.e. cannot construct expressions). > * > * {{{ > * // Compute the average for all numeric columns grouped by department. > * ds.groupBy("department").avg() > * > * // Compute the max age and average salary, grouped by department and > gender. > * ds.groupBy($"department", $"gender").agg(Map( > * "salary" -> "avg", > * "age" -> "max" > * )) > * }}} > * @group untypedrel > * @since 3.4.0 > */ > @scala.annotation.varargs > def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org