[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2017-02-02 Thread Eyal Farago (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851078#comment-15851078
 ] 

Eyal Farago commented on SPARK-18736:
-

Spark-8601 is making (slow) progress,it's actually in final stages of review 
now.
One thing to notice is keeping direct evaluation,code generation and optimized 
version in sync, spark-8601 follows the current behavior of "first wins" by 
transforming map(...).Get(k) into a caseKeyWhen(k,...).
If we decide to choose a "last wins" approach,the optimized version would 
probably have to reverse the order of the keys.

> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2017-02-02 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851057#comment-15851057
 ] 

Shuai Lin commented on SPARK-18736:
---

[~eyalfa] How is it going on? I can work on this one if you're ok.

> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2016-12-06 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727398#comment-15727398
 ] 

Shuai Lin commented on SPARK-18736:
---

Ok, sounds good to me.

> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2016-12-06 Thread Eyal Farago (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726232#comment-15726232
 ] 

Eyal Farago commented on SPARK-18736:
-

@shuai Lin,
I already have a pr in progress that addresses the possible literal keys
optimizations during optimization phase(spark-8601), can you please
implement the proper behavior in runtime and code generation and update the
Jira with the chosen approach? This way we can align the two prs.

Thanks,
Eyal.




> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2016-12-06 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726074#comment-15726074
 ] 

Shuai Lin commented on SPARK-18736:
---

If the keys are all literas, then we can detect and remove the duplicated keys 
during analysis.

But if there are non-literal keys, we can't detect this before the physical 
execution, e.g.:
{code}
spark.createDataFrame(
Seq(
(1, "aaa"),
(2, "bbb"),
(3, "ccc")
)).toDF("id", "name").registerTempTable("df")
sql("select map(name, id, 'aaa', -1) as m from df").show()
{code}

So I think we can do this in two places:

* When preparing the {{keys}} and {{values}} expressions, we can remove all 
duplicated literal keys. 
* When doing codegen, we can add logic to discard the duplicated keys if there 
is any (e.g. by tracking the keys in a set)

[~hvanhovell] Does it sound good?

> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18736) CreateMap allows non-unique keys

2016-12-06 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725579#comment-15725579
 ] 

Shuai Lin commented on SPARK-18736:
---

I can work on this.

> CreateMap allows non-unique keys
> 
>
> Key: SPARK-18736
> URL: https://issues.apache.org/jira/browse/SPARK-18736
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Eyal Farago
>  Labels: map, sql, types
>
> Spark-Sql, {{CreateMap}} does not enforce unique keys, i.e. it's possible to 
> create a map with two identical keys: 
> {noformat}
> CreateMap(Literal(1), Literal(11), Literal(1), Literal(12))
> {noformat}
> This does not behave like standard maps in common programming languages.
> proper behavior should be chosen:
> # first 'wins'
> # last 'wins'
> # runtime error.
> {{GetMapValue}} currently implements option #1. Even if this is the desired 
> behavior {{CreateMap}} should return a unique map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org