[ https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082739#comment-15082739 ]
Sean Owen commented on SPARK-12623: ----------------------------------- There is a {{preservesPartitioning}} flag on some API methods that lets you specify that your function of {{(key, value)}} pairs won't change keys, or at least won't change the partitioning. Unfortunately, for historical reasons this wasn't exposed on the {{map()}} function, but was exposed on {{mapPartitions}}. It's a little clunky to invoke if you only need map, but not much -- you get an iterator that you then map as before. That would at least let you do what you're trying to do. As to exposing a specialized method for this, yeah it's not crazy or anything but I doubt it would be viewed as worth it when there's a fairly direct way to do what you want. (Or else, I'd say argue for a new param to map, but that has its own obscure issues.) > map key_values to values > ------------------------ > > Key: SPARK-12623 > URL: https://issues.apache.org/jira/browse/SPARK-12623 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Elazar Gershuni > Priority: Minor > Labels: easyfix, features, performance > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > Why doesn't the argument to mapValues() take a key as an agument? > Alternatively, can we have a "mapKeyValuesToValues" that does? > Use case: I want to write a simpler analyzer that takes the argument to > map(), and analyze it to see whether it (trivially) doesn't change the key, > e.g. > g = lambda kv: (kv[0], f(kv[0], kv[1])) > rdd.map(g) > Problem is, if I find that it is the case, I can't call mapValues() with that > function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues > receives only `v` as an argument. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org