Re: About nested RDD

Rishi Mishra Fri, 08 Apr 2016 01:39:55 -0700

rdd.count() is a fairly straightforward operations which can  be calculated
on a driver and then the value can be included in the map function.
Is your goal is to write a generic function which operates on two rdds, one
rdd being evaluated for each partition of the other ?
Here also you can use broadcast , if one of your RDD is small enough. If
both the RDDs are fairly big, I would like to understand your use case
better.


Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

On Fri, Apr 8, 2016 at 1:52 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> It seems like the union function on RDDs might be what you are looking
> for, or was there something else you were trying to achieve?
>
>
> On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote:
>
>> Hi all,
>>
>> I know that nested RDDs are not possible like linke rdd1.map(x => x +
>> rdd2.count())
>> I tried to create a custome RDD like following
>>
>> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] {
>>
>> var rdds = new  ArrayBuffer.empty[RDD[(K, (V, Int))]]
>> def update(rdd: RDD[_]) {
>>   udds += rdd
>> }
>> def comput ...
>> def getPartitions ...
>> }
>>
>> In the compute method I call the internal rdds' iterators and got
>> NullPointerException
>> Is this also a form of nested RDDs and how do I get rid of this?
>>
>> Thanks.
>>
>>
>> Tenghuan
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>

Re: About nested RDD

Reply via email to