[ 
https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082739#comment-15082739
 ] 

Sean Owen commented on SPARK-12623:
-----------------------------------

There is a {{preservesPartitioning}} flag on some API methods that lets you 
specify that your function of {{(key, value)}} pairs won't change keys, or at 
least won't change the partitioning. Unfortunately, for historical reasons this 
wasn't exposed on the {{map()}} function, but was exposed on {{mapPartitions}}. 
It's a little clunky to invoke if you only need map, but not much -- you get an 
iterator that you then map as before.

That would at least let you do what you're trying to do. As to exposing a 
specialized method for this, yeah it's not crazy or anything but I doubt it 
would be viewed as worth it when there's a fairly direct way to do what you 
want. (Or else, I'd say argue for a new param to map, but that has its own 
obscure issues.)

> map key_values to values
> ------------------------
>
>                 Key: SPARK-12623
>                 URL: https://issues.apache.org/jira/browse/SPARK-12623
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Elazar Gershuni
>            Priority: Minor
>              Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? 
> Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to 
> map(), and analyze it to see whether it (trivially) doesn't change the key, 
> e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that 
> function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues 
> receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to