You want to use RDD.union (or SparkContext.union for many RDDs). These don't join on a key. Union doesn't really do anything itself, so it is low overhead. Note that the combined RDD will have all the partitions of the original RDDs, so you may want to coalesce after the union.
val x = sc.parallelize(Seq( (1, 3), (2, 4) )) val y = sc.parallelize(Seq( (3, 5), (4, 7) )) val z = x.union(y) z.collect res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7)) On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <person.of.b...@gmail.com> wrote: > Say I have two RDDs with the following values > > x = [(1, 3), (2, 4)] > > and > > y = [(3, 5), (4, 7)] > > and I want to have > > z = [(1, 3), (2, 4), (3, 5), (4, 7)] > > How can I achieve this. I know you can use outerJoin followed by map to > achieve this, but is there a more direct way for this. > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 54 W 40th St, New York, NY 10018 E: daniel.siegm...@velos.io W: www.velos.io