Hi all,

I'm looking for a way to efficiently partition an RDD, but allow the same
data to exists on multiple partitions.


Lets say I have a key-value RDD with keys {1,2,3,4}

I want to be able to to repartition the RDD so that so the partitions look
like

p1 = {1,2}
p2 = {2,3}
p3 = {3,4}

Locality is important in this situation as I would be doing internal
comparison values.

Does anyone have any thoughts as to how I could go about doing this?

Thank you
  • [no subject] Daniel Imberman
    • Re: Sabarish Sasidharan

Reply via email to