[jira] [Created] (SPARK-42576) Add 2nd groupBy method to Dataset

Jira Fri, 24 Feb 2023 18:37:14 -0800

Herman van Hövell created SPARK-42576:
-----------------------------------------


             Summary: Add 2nd groupBy method to Dataset
                 Key: SPARK-42576
                 URL: https://issues.apache.org/jira/browse/SPARK-42576
             Project: Spark
          Issue Type: New Feature
          Components: Connect
    Affects Versions: 3.4.0
            Reporter: Herman van Hövell


Dataset is missing a groupBy method:
{code:java}
/**
 * Groups the Dataset using the specified columns, so that we can run 
aggregation on them.
 * See [[RelationalGroupedDataset]] for all the available aggregate functions.
 *
 * This is a variant of groupBy that can only group by existing columns using 
column names
 * (i.e. cannot construct expressions).
 *
 * {{{
 *   // Compute the average for all numeric columns grouped by department.
 *   ds.groupBy("department").avg()
 *
 *   // Compute the max age and average salary, grouped by department and 
gender.
 *   ds.groupBy($"department", $"gender").agg(Map(
 *     "salary" -> "avg",
 *     "age" -> "max"
 *   ))
 * }}}
 * @group untypedrel
 * @since 3.4.0
 */
@scala.annotation.varargs
def groupBy(col1: String, cols: String*): RelationalGroupedDataset {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42576) Add 2nd groupBy method to Dataset

Reply via email to