[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851078#comment-15851078 ] Eyal Farago commented on SPARK-18736: - Spark-8601 is making (slow) progress,it's actually in final stages of review now. One thing to notice is keeping direct evaluation,code generation and optimized version in sync, spark-8601 follows the current behavior of "first wins" by transforming map(...).Get(k) into a caseKeyWhen(k,...). If we decide to choose a "last wins" approach,the optimized version would probably have to reverse the order of the keys. > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851057#comment-15851057 ] Shuai Lin commented on SPARK-18736: --- [~eyalfa] How is it going on? I can work on this one if you're ok. > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727398#comment-15727398 ] Shuai Lin commented on SPARK-18736: --- Ok, sounds good to me. > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726232#comment-15726232 ] Eyal Farago commented on SPARK-18736: - @shuai Lin, I already have a pr in progress that addresses the possible literal keys optimizations during optimization phase(spark-8601), can you please implement the proper behavior in runtime and code generation and update the Jira with the chosen approach? This way we can align the two prs. Thanks, Eyal. > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726074#comment-15726074 ] Shuai Lin commented on SPARK-18736: --- If the keys are all literas, then we can detect and remove the duplicated keys during analysis. But if there are non-literal keys, we can't detect this before the physical execution, e.g.: {code} spark.createDataFrame( Seq( (1, "aaa"), (2, "bbb"), (3, "ccc") )).toDF("id", "name").registerTempTable("df") sql("select map(name, id, 'aaa', -1) as m from df").show() {code} So I think we can do this in two places: * When preparing the {{keys}} and {{values}} expressions, we can remove all duplicated literal keys. * When doing codegen, we can add logic to discard the duplicated keys if there is any (e.g. by tracking the keys in a set) [~hvanhovell] Does it sound good? > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys
[ https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725579#comment-15725579 ] Shuai Lin commented on SPARK-18736: --- I can work on this. > CreateMap allows non-unique keys > > > Key: SPARK-18736 > URL: https://issues.apache.org/jira/browse/SPARK-18736 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Eyal Farago > Labels: map, sql, types > > Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to > create a map with two identical keys: > {noformat} > CreateMap(Literal(1), Literal(11), Literal(1), Literal(12)) > {noformat} > This does not behave like standard maps in common programming languages. > proper behavior should be chosen: > # first 'wins' > # last 'wins' > # runtime error. > {{GetMapValue}} currently implements option #1. Even if this is the desired > behavior {{CreateMap}} should return a unique map. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org