[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726074#comment-15726074 ]
Shuai Lin commented on SPARK-18736: ----------------------------------- If the keys are all literas, then we can detect and remove the duplicated keys during analysis. But if there are non-literal keys, we can't detect this before the physical execution, e.g.: {code} spark.createDataFrame( Seq( (1, "aaa"), (2, "bbb"), (3, "ccc") )).toDF("id", "name").registerTempTable("df") sql("select map(name, id, 'aaa', -1) as m from df").show() {code} So I think we can do this in two places: * When preparing the {{keys}} and {{values}} expressions, we can remove all duplicated literal keys. * When doing codegen, we can add logic to discard the duplicated keys if there is any (e.g. by tracking the keys in a set) [~hvanhovell] Does it sound good? > CreateMap allows non-unique keys > -------------------------------- > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org