GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/23045

    [SPARK-26071][SQL] disallow map as map key

    ## What changes were proposed in this pull request?
    
    Due to implementation limitation, currently Spark can't compare or do 
equality check between map types. As a result, map values can't appear in EQUAL 
or comparison expressions, can't be grouping key, etc.
    
    The more important thing is, map loop up needs to do equality check of the 
map key, and thus can't support map as map key when looking up values from a 
map. Thus it's not useful to have map as map key.
    
    This PR proposes to stop users from creating maps using map type as key. 
The list of expressions that are updated: `CreateMap`, `MapFromArrays`, 
`MapFromEntries`, `MapConcat`, `TransformKeys`. I manually checked all the 
places that create `MapType`, and came up with this list.
    
    Note that, maps with map type key still exist, via reading from parquet 
files, converting from scala/java map, etc. This PR is not to completely forbid 
map as map key, but to avoid creating it by Spark itself.
    
    Motivation: when I was trying to fix the duplicate key problem, I found 
it's impossible to do it with map type map key. I think it's reasonable to 
avoid map type map key for builtin functions.
    
     
    
    ## How was this patch tested?
    
    updated test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark map-key

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23045
    
----
commit 3ff0cd592c52839d0aac739b44cee0cf02e951bc
Author: Wenchen Fan <wenchen@...>
Date:   2018-11-15T10:23:58Z

    disallow map as map key

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to