Herman van Hövell created SPARK-42576: -----------------------------------------
Summary: Add 2nd groupBy method to Dataset Key: SPARK-42576 URL: https://issues.apache.org/jira/browse/SPARK-42576 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.4.0 Reporter: Herman van Hövell Dataset is missing a groupBy method: {code:java} /** * Groups the Dataset using the specified columns, so that we can run aggregation on them. * See [[RelationalGroupedDataset]] for all the available aggregate functions. * * This is a variant of groupBy that can only group by existing columns using column names * (i.e. cannot construct expressions). * * {{{ * // Compute the average for all numeric columns grouped by department. * ds.groupBy("department").avg() * * // Compute the max age and average salary, grouped by department and gender. * ds.groupBy($"department", $"gender").agg(Map( * "salary" -> "avg", * "age" -> "max" * )) * }}} * @group untypedrel * @since 3.4.0 */ @scala.annotation.varargs def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org