[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

gatorsmile Sun, 09 Dec 2018 23:33:51 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23124#discussion_r240104815
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -27,6 +27,8 @@ displayTitle: Spark SQL Upgrading Guide
     
       - In Spark version 2.4 and earlier, float/double -0.0 is semantically 
equal to 0.0, but users can still distinguish them via `Dataset.show`, 
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 
internally, and users can't distinguish them any more.
     
    +  - In Spark version 2.4 and earlier, users can create a map with 
duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. 
The behavior of map with duplicated keys is undefined, e.g. map look up 
respects the duplicated key appears first, `Dataset.collect` only keeps the 
duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since 
Spark 3.0, these built-in functions will remove duplicated map keys with last 
wins policy. Users may still read map values with duplicated keys from data 
sources which do not enforce it (e.g. Parquet), the behavior will be udefined.
    --- End diff --
    
    A few typos. How about?
    
    ```
    In Spark version 2.4 and earlier, users can create a map with duplicate 
keys via built-in functions like `CreateMap` and `StringToMap`. The behavior of 
map with duplicate keys is undefined. For example, the map lookup respects the 
duplicate key that appears first, `Dataset.collect` only keeps the duplicate 
key that appears last, and `MapKeys` returns duplicate keys. Since Spark 3.0, 
these built-in functions will remove duplicate map keys using the last-one-wins 
policy. Users may still read map values with duplicate keys from the data 
sources that do not enforce it (e.g. Parquet), but the behavior will be 
undefined.
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

Reply via email to