You want to use RDD.union (or SparkContext.union for many RDDs). These
don't join on a key. Union doesn't really do anything itself, so it is low
overhead. Note that the combined RDD will have all the partitions of the
original RDDs, so you may want to coalesce after the union.

val x = sc.parallelize(Seq( (1, 3), (2, 4) ))
val y = sc.parallelize(Seq( (3, 5), (4, 7) ))
val z = x.union(y)

z.collect
res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7))


On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <person.of.b...@gmail.com>
wrote:

> Say I have two RDDs with the following values
>
> x = [(1, 3), (2, 4)]
>
> and
>
> y = [(3, 5), (4, 7)]
>
> and I want to have
>
> z = [(1, 3), (2, 4), (3, 5), (4, 7)]
>
> How can I achieve this. I know you can use outerJoin followed by map to
> achieve this, but is there a more direct way for this.
>



-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

54 W 40th St, New York, NY 10018
E: daniel.siegm...@velos.io W: www.velos.io

Reply via email to