What's the best way to go from: RDD[(A, B)] to (RDD[A], RDD[B])
If I do: def separate[A, B](k: RDD[(A, B)]) = (k.map(_._1), k.map(_._2)) Which is the obvious solution, this runs two maps in the cluster. Can I do some kind of a fold instead: def separate[A, B](l: List[(A, B)]) = l.foldLeft(List[A](), List[B]())((a, b) => (b._1 :: a._1, b._2 :: a._2)) But obviously this has an aggregate component that I don't want to be running on the driver right? Thanks, Alex