[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234475488
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide
 
   - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
 
+  - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a 
grouped dataset with key attribute wrongly named as "value", if the key is 
atomic type, e.g. int, string, etc. This is counterintuitive and makes the 
schema of aggregation queries weird. For example, the schema of 
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the 
grouping attribute to "key". The old behaviour is preserved under a newly added 
configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default 
value of `false`.
--- End diff --

Ok. More accurate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234475321
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1594,6 +1594,15 @@ object SQLConf {
 "WHERE, which does not follow SQL standard.")
   .booleanConf
   .createWithDefault(false)
+
+  val LEGACY_ATOMIC_KEY_ATTRIBUTE_GROUP_BY_KEY =
+buildConf("spark.sql.legacy.atomicKeyAttributeGroupByKey")
--- End diff --

`spark.sql.legacy.dataset.aliasNonStructGroupingKey`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234475156
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide
 
   - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
 
+  - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a 
grouped dataset with key attribute wrongly named as "value", if the key is 
atomic type, e.g. int, string, etc. This is counterintuitive and makes the 
schema of aggregation queries weird. For example, the schema of 
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the 
grouping attribute to "key". The old behaviour is preserved under a newly added 
configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default 
value of `false`.
--- End diff --

I realized that, only struct type key has the `key` alias. So here we 
should say: `if the key is non-struct type, e.g. int, string, array, etc.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234408150
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,6 +17,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
 
+  - In Spark version 2.4 and earlier, the key attribute is wrongly named 
as "value" for primitive key type when doing typed aggregation on Dataset. This 
attribute is now named as "key" since Spark 3.0 like complex key type.
--- End diff --

Updated as suggestion. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23054#discussion_r234401319
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,6 +17,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
 
+  - In Spark version 2.4 and earlier, the key attribute is wrongly named 
as "value" for primitive key type when doing typed aggregation on Dataset. This 
attribute is now named as "key" since Spark 3.0 like complex key type.
--- End diff --

```
In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped 
dataset with key attribute
wrongly named as "value", if the `Dataset` element is of atomic type, e.g. 
int, string, etc. This is
counterintuitive and makes the schema of aggregation queries weird. For 
example, the schema
of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we 
name the
grouping attribute to "key".
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

2018-11-15 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/23054

[SPARK-26085][SQL] Key attribute of primitive type under typed aggregation 
should be named as "key" too

## What changes were proposed in this pull request?

When doing typed aggregation on a Dataset, for complex key type, the key 
attribute is named as "key". But for primitive type, the key attribute is named 
as "value". This key attribute should also be named as "key" for primitive type.

## How was this patch tested?

Added test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-26085

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23054


commit c7bbe91519aec116ae2c2f449f518f59cc49c7c0
Author: Liang-Chi Hsieh 
Date:   2018-11-16T01:52:12Z

Named key attribute for primitive type as "key".




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org