[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

cloud-fan Sat, 17 Nov 2018 00:38:12 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23054#discussion_r234401319
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -17,6 +17,8 @@ displayTitle: Spark SQL Upgrading Guide
     
       - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
     
    +  - In Spark version 2.4 and earlier, the key attribute is wrongly named 
as "value" for primitive key type when doing typed aggregation on Dataset. This 
attribute is now named as "key" since Spark 3.0 like complex key type.
    --- End diff --
    
    ```
    In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped 
dataset with key attribute
    wrongly named as "value", if the `Dataset` element is of atomic type, e.g. 
int, string, etc. This is
    counterintuitive and makes the schema of aggregation queries weird. For 
example, the schema
    of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we 
name the
    grouping attribute to "key".
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

Reply via email to