rdd.count() is a fairly straightforward operations which can be calculated on a driver and then the value can be included in the map function. Is your goal is to write a generic function which operates on two rdds, one rdd being evaluated for each partition of the other ? Here also you can use broadcast , if one of your RDD is small enough. If both the RDDs are fairly big, I would like to understand your use case better.
Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Fri, Apr 8, 2016 at 1:52 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > It seems like the union function on RDDs might be what you are looking > for, or was there something else you were trying to achieve? > > > On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote: > >> Hi all, >> >> I know that nested RDDs are not possible like linke rdd1.map(x => x + >> rdd2.count()) >> I tried to create a custome RDD like following >> >> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] { >> >> var rdds = new ArrayBuffer.empty[RDD[(K, (V, Int))]] >> def update(rdd: RDD[_]) { >> udds += rdd >> } >> def comput ... >> def getPartitions ... >> } >> >> In the compute method I call the internal rdds' iterators and got >> NullPointerException >> Is this also a form of nested RDDs and how do I get rid of this? >> >> Thanks. >> >> >> Tenghuan >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > >