Re: About nested RDD

Tenghuan He Fri, 08 Apr 2016 02:04:21 -0700

Hi

Thanks for your reply.
Yes, It's very much like the union() method, but there is some difference.

I have a very large RDD A, and a lot of small RDDs b, c, d and so on.
and A.update(a) will update some element in the A and return a new RDD

when calling
val B = A.update(b).update(c).update(d).update().....
B.count()

The count action will call the compute method.
and each update will iterating the large rdd A.
To avoid this I can merge these small rdds first to rdds then call
A.update(rdds)
But I don't hope to do this merge manually outside but inside RDD A
automatically

I hope I made it clear.

On Fri, Apr 8, 2016 at 4:22 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> It seems like the union function on RDDs might be what you are looking
> for, or was there something else you were trying to achieve?
>
>
> On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote:
>
>> Hi all,
>>
>> I know that nested RDDs are not possible like linke rdd1.map(x => x +
>> rdd2.count())
>> I tried to create a custome RDD like following
>>
>> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] {
>>
>> var rdds = new  ArrayBuffer.empty[RDD[(K, (V, Int))]]
>> def update(rdd: RDD[_]) {
>>   udds += rdd
>> }
>> def comput ...
>> def getPartitions ...
>> }
>>
>> In the compute method I call the internal rdds' iterators and got
>> NullPointerException
>> Is this also a form of nested RDDs and how do I get rid of this?
>>
>> Thanks.
>>
>>
>> Tenghuan
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>
>

Re: About nested RDD

Reply via email to