Daniel Imberman Mon, 11 Jan 2016 10:52:05 -0800

Hi all,

I'm looking for a way to efficiently partition an RDD, but allow the same
data to exists on multiple partitions.



Lets say I have a key-value RDD with keys {1,2,3,4}

I want to be able to to repartition the RDD so that so the partitions look
like

p1 = {1,2}
p2 = {2,3}
p3 = {3,4}

Locality is important in this situation as I would be doing internal
comparison values.

Does anyone have any thoughts as to how I could go about doing this?

Thank you

Reply via email to