[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

cloud-fan Mon, 26 Nov 2018 06:27:02 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23124#discussion_r236274428
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -19,6 +19,8 @@ displayTitle: Spark SQL Upgrading Guide
     
       - In Spark version 2.4 and earlier, users can create map values with map 
type key via built-in function like `CreateMap`, `MapFromArrays`, etc. Since 
Spark 3.0, it's not allowed to create map values with map type key with these 
built-in functions. Users can still read map values with map type key from data 
source or Java/Scala collections, though they are not very useful.
     
    +  - In Spark version 2.4 and earlier, users can create a map with 
duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. 
The behavior of map with duplicated keys is undefined, e.g. map look up 
respects the duplicated key appears first, `Dataset.collect` only keeps the 
duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since 
Spark 3.0, these built-in functions will remove duplicated map keys with last 
wins policy.
    --- End diff --
    
    They are related, but they are not the same. For example, we don't support 
map type as key, because we can't check equality of map type correctly. This is 
just a current implementation limitation, and we may relax it in the future.
    
    Duplicated map keys is a real problem and we will never allow it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...

Reply via email to