[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234475488 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. + - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped dataset with key attribute wrongly named as "value", if the key is atomic type, e.g. int, string, etc. This is counterintuitive and makes the schema of aggregation queries weird. For example, the schema of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the grouping attribute to "key". The old behaviour is preserved under a newly added configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default value of `false`. --- End diff -- Ok. More accurate. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234475321 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1594,6 +1594,15 @@ object SQLConf { "WHERE, which does not follow SQL standard.") .booleanConf .createWithDefault(false) + + val LEGACY_ATOMIC_KEY_ATTRIBUTE_GROUP_BY_KEY = +buildConf("spark.sql.legacy.atomicKeyAttributeGroupByKey") --- End diff -- `spark.sql.legacy.dataset.aliasNonStructGroupingKey`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234475156 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. + - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped dataset with key attribute wrongly named as "value", if the key is atomic type, e.g. int, string, etc. This is counterintuitive and makes the schema of aggregation queries weird. For example, the schema of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the grouping attribute to "key". The old behaviour is preserved under a newly added configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default value of `false`. --- End diff -- I realized that, only struct type key has the `key` alias. So here we should say: `if the key is non-struct type, e.g. int, string, array, etc.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234408150 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -17,6 +17,8 @@ displayTitle: Spark SQL Upgrading Guide - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. + - In Spark version 2.4 and earlier, the key attribute is wrongly named as "value" for primitive key type when doing typed aggregation on Dataset. This attribute is now named as "key" since Spark 3.0 like complex key type. --- End diff -- Updated as suggestion. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23054#discussion_r234401319 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -17,6 +17,8 @@ displayTitle: Spark SQL Upgrading Guide - The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set. + - In Spark version 2.4 and earlier, the key attribute is wrongly named as "value" for primitive key type when doing typed aggregation on Dataset. This attribute is now named as "key" since Spark 3.0 like complex key type. --- End diff -- ``` In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped dataset with key attribute wrongly named as "value", if the `Dataset` element is of atomic type, e.g. int, string, etc. This is counterintuitive and makes the schema of aggregation queries weird. For example, the schema of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the grouping attribute to "key". ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/23054 [SPARK-26085][SQL] Key attribute of primitive type under typed aggregation should be named as "key" too ## What changes were proposed in this pull request? When doing typed aggregation on a Dataset, for complex key type, the key attribute is named as "key". But for primitive type, the key attribute is named as "value". This key attribute should also be named as "key" for primitive type. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-26085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23054 commit c7bbe91519aec116ae2c2f449f518f59cc49c7c0 Author: Liang-Chi Hsieh Date: 2018-11-16T01:52:12Z Named key attribute for primitive type as "key". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org