Hi Thanks for your reply. Yes, It's very much like the union() method, but there is some difference.
I have a very large RDD A, and a lot of small RDDs b, c, d and so on. and A.update(a) will update some element in the A and return a new RDD when calling val B = A.update(b).update(c).update(d).update()..... B.count() The count action will call the compute method. and each update will iterating the large rdd A. To avoid this I can merge these small rdds first to rdds then call A.update(rdds) But I don't hope to do this merge manually outside but inside RDD A automatically I hope I made it clear. On Fri, Apr 8, 2016 at 4:22 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > It seems like the union function on RDDs might be what you are looking > for, or was there something else you were trying to achieve? > > > On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote: > >> Hi all, >> >> I know that nested RDDs are not possible like linke rdd1.map(x => x + >> rdd2.count()) >> I tried to create a custome RDD like following >> >> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] { >> >> var rdds = new ArrayBuffer.empty[RDD[(K, (V, Int))]] >> def update(rdd: RDD[_]) { >> udds += rdd >> } >> def comput ... >> def getPartitions ... >> } >> >> In the compute method I call the internal rdds' iterators and got >> NullPointerException >> Is this also a form of nested RDDs and how do I get rid of this? >> >> Thanks. >> >> >> Tenghuan >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > >