Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/22030#discussion_r208460101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql]( * * {{{ * // Compute the sum of earnings for each year by course with each course as a separate column - * df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings") + * df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), lit("Java"))).sum($"earnings") + * }}} + * + * For pivoting by multiple columns, use the `struct` function to combine the columns and values: + * + * {{{ + * df + * .groupBy($"year") + * .pivot(struct($"course", $"training"), Seq(struct(lit("java"), lit("Experts")))) + * .agg(sum($"earnings")) * }}} * * @param pivotColumn the column to pivot. * @param values List of values that will be translated to columns in the output DataFrame. * @since 2.4.0 */ - def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { + def pivot(pivotColumn: Column, values: Seq[Column]): RelationalGroupedDataset = { --- End diff -- > The previous interface pivot(Column, Seq[Any]) has existed for more then multiple years. Is this based on actual feedback from users or your speculation?\ This is what @MaxGekk added in https://github.com/apache/spark/pull/21699. > This assumption of yours is not true. See my reply to your comment below. No. Seq[Any] takes literal values (objects); Seq[Column] takes `Column` expressions.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org