As mentioned earlier you can create a broadcast variable containing all the small RDD elements. I hope they are really small. Then you can fire A.updatae(broadcastVariable).
Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Fri, Apr 8, 2016 at 2:33 PM, Tenghuan He <tenghua...@gmail.com> wrote: > Hi > > Thanks for your reply. > Yes, It's very much like the union() method, but there is some difference. > > I have a very large RDD A, and a lot of small RDDs b, c, d and so on. > and A.update(a) will update some element in the A and return a new RDD > > when calling > val B = A.update(b).update(c).update(d).update()..... > B.count() > > The count action will call the compute method. > and each update will iterating the large rdd A. > To avoid this I can merge these small rdds first to rdds then call > A.update(rdds) > But I don't hope to do this merge manually outside but inside RDD A > automatically > > I hope I made it clear. > > > On Fri, Apr 8, 2016 at 4:22 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > >> It seems like the union function on RDDs might be what you are looking >> for, or was there something else you were trying to achieve? >> >> >> On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote: >> >>> Hi all, >>> >>> I know that nested RDDs are not possible like linke rdd1.map(x => x + >>> rdd2.count()) >>> I tried to create a custome RDD like following >>> >>> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] { >>> >>> var rdds = new ArrayBuffer.empty[RDD[(K, (V, Int))]] >>> def update(rdd: RDD[_]) { >>> udds += rdd >>> } >>> def comput ... >>> def getPartitions ... >>> } >>> >>> In the compute method I call the internal rdds' iterators and got >>> NullPointerException >>> Is this also a form of nested RDDs and how do I get rid of this? >>> >>> Thanks. >>> >>> >>> Tenghuan >>> >> >> >> -- >> Cell : 425-233-8271 >> Twitter: https://twitter.com/holdenkarau >> >> >