This is as much of a Scala question as a Spark question

I have an RDD:

val rdd1: RDD[(Long, Array[Long])]

This RDD has duplicate keys that I can collapse such

val rdd2: RDD[(Long, Array[Long])] = rdd1.reduceByKey((a,b) => a++b)

If I start with an Array of primitive longs in rdd1, will rdd2 also have
Arrays of primitive longs?  I suspect, based on my memory usage, that this
is not the case.

Also, would it be more efficient to do this:

val rdd1: RDD[(Long, ArrayBuffer[Long])]

and then

val rdd2: RDD[(Long, Array[Long])] = rdd1.reduceByKey((a,b) =>
a++b).map(_.toArray)

Reply via email to