subject:"\[GitHub\] spark pull request #22316\: \[SPARK\-25048\]\[SQL\] Pivoting by multiple columns i..."

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22316


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r221428797
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy("year").pivot("course").sum("earnings")
* }}}
*
+   * From Spark 2.5.0, values can be literal columns, for instance, 
struct. For pivoting by
+   * multiple columns, use the `struct` function to combine the columns 
and values:
+   *
+   * {{{
+   *   df.groupBy($"year")
--- End diff --

we can. just to match the examples with above except the difference. really 
not a big deal at all.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-29 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r221427994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy("year").pivot("course").sum("earnings")
* }}}
*
+   * From Spark 2.5.0, values can be literal columns, for instance, 
struct. For pivoting by
+   * multiple columns, use the `struct` function to combine the columns 
and values:
+   *
+   * {{{
+   *   df.groupBy($"year")
--- End diff --

Why cannot be grouping by `Column` type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r221427775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy("year").pivot("course").sum("earnings")
* }}}
*
+   * From Spark 2.5.0, values can be literal columns, for instance, 
struct. For pivoting by
+   * multiple columns, use the `struct` function to combine the columns 
and values:
+   *
+   * {{{
+   *   df.groupBy($"year")
--- End diff --

nit: `$"year"` -> `"year"`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-22 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r219686833
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

That's true in general but specifically is decimal precision more correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r219368724
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

now we eventually call `Literal.create` instead of `Literal.apply`. I'm not 
sure if there is a behavior change though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r219368791
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

from a quick look, seems `Literal.create` is more powerful and should not 
have regressions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r219368334
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -330,6 +331,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy("year").pivot("course").sum("earnings")
* }}}
*
+   * From Spark 3.0.0, values can be literal columns, for instance, 
struct. For pivoting by
--- End diff --

3.0.0 => 2.5.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-13 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r217476618
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

> Looks at least there's one case of a potential behaviour change about 
scale and precision.

Could you explain, please. Why do you expect some behavior change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r217251795
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

@MaxGekk, just for doubly doubly sure, shell we 
`Try(...).getOrElse(lit(...).expr)`? Looks at least there's one case of a 
potential behaviour change about scale and precision.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r216122957
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -330,6 +331,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy("year").pivot("course").sum("earnings")
* }}}
*
+   * From Spark 2.4.0, values can be literal columns, for instance, 
struct. For pivoting by
--- End diff --

Let's target 3.0.0 @MaxGekk.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-04 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214894498
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

I think invalid queries basically throw `AnalysisException. But, yea, 
indeed, we'd better to keep the current behaivour. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-04 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214842133
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

> I miss something?

No, you don't. The exception for sure is thrown inside of `lit` because 
`collect()` returns a complex value which cannot be "wrapped" by lit. This is 
exactly checked in the test which I added to show existing behavior.

> btw, IMHO AnalysisException is better than RuntimeException in this case?

@maropu Could you explain, please, why do you think `AnalysisException` is 
better for the error occurs in run-time?

Just in case, in the PR, I don't aim to change behavior of existing method: 
`def pivot(pivotColumn: Column): RelationalGroupedDataset`. I believe it should 
be discussed separately regarding to needs for changing user visible behavior.  
The PR aims to improve `def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset` to allow users to specify `struct` literals in 
particular. Please, see the description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214761811
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

I tried in your branch;
```
scala> df.show
+++
|training|   sales|
+++
| Experts|[dotNET, 2012, 10...|
| Experts|[JAVA, 2012, 2000...|
| Dummies|[dotNet, 2012, 50...|
| Experts|[dotNET, 2013, 48...|
| Dummies|[Java, 2013, 3000...|
+++

scala> df.groupBy($"sales.year").pivot(struct(lower($"sales.course"), 
$"training")).agg(sum($"sales.earnings"))
java.lang.RuntimeException: Unsupported literal type class 
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [dotnet,Dummies]
  at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
  at scala.util.Try.getOrElse(Try.scala:79)
  at 
org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:163)
  at org.apache.spark.sql.functions$.typedLit(functions.scala:127)
```
I miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214754379
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

Don't see any advantages of this. It is longer and slower.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214752855
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
 new RelationalGroupedDataset(
   df,
   groupingExprs,
-  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(Literal.apply)))
+  RelationalGroupedDataset.PivotType(pivotColumn.expr, 
values.map(lit(_).expr)))
--- End diff --

What do you think about `map(lit).map(_.expr)` instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214722485
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

My changes don't throw the exception. It is thrown in the collect() : 
https://github.com/apache/spark/blob/41c2227a2318029709553a588e44dee28f106350/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala#L385

@maropu Do you propose to catch `RuntimeException` and replace it by 
`AnalysisException`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214566503
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
 
 assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
   }
+
+  test("pivoting column list with values") {
+val expected = Row(2012, 1.0, null) :: Row(2013, 48000.0, 3.0) 
:: Nil
+val df = trainingSales
+  .groupBy($"sales.year")
+  .pivot(struct(lower($"sales.course"), $"training"), Seq(
+struct(lit("dotnet"), lit("Experts")),
+struct(lit("java"), lit("Dummies")))
+  ).agg(sum($"sales.earnings"))
+
+checkAnswer(df, expected)
+  }
+
+  test("pivoting column list") {
+val exception = intercept[RuntimeException] {
+  trainingSales
+.groupBy($"sales.year")
+.pivot(struct(lower($"sales.course"), $"training"))
+.agg(sum($"sales.earnings"))
+.collect()
--- End diff --

Don't need this `.collect()` to cactch `RuntimeException`? btw, IMHO 
`AnalysisException` is better than `RuntimeException` in this case? Can't we?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214566083
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -406,6 +407,14 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
* }}}
*
+   * For pivoting by multiple columns, use the `struct` function to 
combine the columns and values:
--- End diff --

Since the documentation states it's an overloaded version of ``` the 
`pivot` method with `pivotColumn` of the `String` type. ```, shall we move this 
contents to that method?

Also, I would document this, for instance, 

From Spark 2.4.0, values can be literal columns, for instance, `struct`. 
For pivoting by multiple columns, use the `struct` function to combine the 
columns and values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22316#discussion_r214544896
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -406,6 +407,15 @@ class RelationalGroupedDataset protected[sql](
*   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
* }}}
*
+   * For pivoting by multiple columns, use the `struct` function to 
combine the columns and values:
+   *
+   * {{{
+   *   df
+   * .groupBy($"year")
--- End diff --

I would make this line up


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread MaxGekk

GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22316

[SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java

## What changes were proposed in this pull request?

In the PR, I propose to extend implementation of existing method:
```
def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset
```
to support values of the struct type. This allows pivoting by multiple 
columns combined by `struct`:
```
trainingSales
  .groupBy($"sales.year")
  .pivot(
pivotColumn = struct(lower($"sales.course"), $"training"),
values = Seq(
  struct(lit("dotnet"), lit("Experts")),
  struct(lit("java"), lit("Dummies")))
  ).agg(sum($"sales.earnings"))
```

## How was this patch tested?

Added a test for values specified via `struct` in Java and Scala.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 pivoting-by-multiple-columns2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22316


commit 058072544fdd606392a57615119bb55dff5345c0
Author: Maxim Gekk 
Date:   2018-09-02T12:24:20Z

Support columns as values

commit 1221db39b75a9b9bd4fbc6144150283d9c24e9d5
Author: Maxim Gekk 
Date:   2018-09-02T13:14:55Z

Added a test for the case when values are not specified

commit a097b294854f99ec58ca307d85c19e54cd76d6b8
Author: Maxim Gekk 
Date:   2018-09-02T13:19:14Z

Added a test for Java




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

21 matches

Site Navigation

Mail list logo

Footer information