[ 
https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082639#comment-15082639
 ] 

Elazar Gershuni commented on SPARK-12623:
-----------------------------------------

That does not answer the question/feature request. Mapping values to values can 
be achieved by similar code to the one you suggested:

rdd.map { case (key, value) => (key, myFunctionOf(value)) }

Yet Spark does provide rdd.mapValues(), for performance reasons (retaining the 
partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you 
suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it 
really isn't a user question.

> map key_values to values
> ------------------------
>
>                 Key: SPARK-12623
>                 URL: https://issues.apache.org/jira/browse/SPARK-12623
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Elazar Gershuni
>            Priority: Minor
>              Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? 
> Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to 
> map(), and analyze it to see whether it (trivially) doesn't change the key, 
> e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that 
> function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues 
> receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to