Hi all, I'm looking for a way to efficiently partition an RDD, but allow the same data to exists on multiple partitions.
Lets say I have a key-value RDD with keys {1,2,3,4} I want to be able to to repartition the RDD so that so the partitions look like p1 = {1,2} p2 = {2,3} p3 = {3,4} Locality is important in this situation as I would be doing internal comparison values. Does anyone have any thoughts as to how I could go about doing this? Thank you