[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726074#comment-15726074
 ] 

Shuai Lin commented on SPARK-18736:
-----------------------------------

If the keys are all literas, then we can detect and remove the duplicated keys 
during analysis.

But if there are non-literal keys, we can't detect this before the physical 
execution, e.g.:
{code}
spark.createDataFrame(
    Seq(
    (1, "aaa"),
    (2, "bbb"),
    (3, "ccc")
)).toDF("id", "name").registerTempTable("df")
sql("select map(name, id, 'aaa', -1) as m from df").show()
{code}

So I think we can do this in two places:

* When preparing the {{keys}} and {{values}} expressions, we can remove all 
duplicated literal keys. 
* When doing codegen, we can add logic to discard the duplicated keys if there 
is any (e.g. by tracking the keys in a set)

[~hvanhovell] Does it sound good?

> CreateMap allows non-unique keys
> --------------------------------
>
>                 Key: SPARK-18736
>                 URL: https://issues.apache.org/jira/browse/SPARK-18736
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Eyal Farago
>              Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to