RDD pair to pair of RDDs

Alex Turner (TMS) Wed, 18 Mar 2015 11:50:40 -0700

What's the best way to go from:

RDD[(A, B)] to (RDD[A], RDD[B])


If I do:

def separate[A, B](k: RDD[(A, B)]) = (k.map(_._1), k.map(_._2))

Which is the obvious solution, this runs two maps in the cluster.  Can I do 
some kind of a fold instead:

def separate[A, B](l: List[(A, B)]) = l.foldLeft(List[A](), List[B]())((a, b) 
=> (b._1 :: a._1, b._2 :: a._2))

But obviously this has an aggregate component that I don't want to be running 
on the driver right?


Thanks,

Alex

RDD pair to pair of RDDs

Reply via email to