Patrick,
If I understand this correctly, I won't be able to do this in the closure
provided to mapPartitions() because that's going to be stateless, in the
sense that a hash map that I create within the closure would only be useful
for one call of MapPartitionsRDD.compute(). I guess I would need
If you'd like to re-use the resulting inverted map, you can persist the result:
x = myRdd.mapPartitions(create inverted map).persist()
Your function would create the reverse map and then return an iterator
over the keys in that map.
On Wed, Sep 17, 2014 at 1:04 PM, Akshat Aranya
I have a use case where my RDD is set up such:
Partition 0:
K1 - [V1, V2]
K2 - [V2]
Partition 1:
K3 - [V1]
K4 - [V3]
I want to invert this RDD, but only within a partition, so that the
operation does not require a shuffle. It doesn't matter if the partitions
of the inverted RDD have non unique
If each partition can fit in memory, you can do this using
mapPartitions and then building an inverse mapping within each
partition. You'd need to construct a hash map within each partition
yourself.
On Tue, Sep 16, 2014 at 4:27 PM, Akshat Aranya aara...@gmail.com wrote:
I have a use case where