As mentioned earlier you can create a broadcast variable containing all the
small RDD elements. I hope they are really small.  Then you can fire
A.updatae(broadcastVariable).

Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

On Fri, Apr 8, 2016 at 2:33 PM, Tenghuan He <tenghua...@gmail.com> wrote:

> Hi
>
> Thanks for your reply.
> Yes, It's very much like the union() method, but there is some difference.
>
> I have a very large RDD A, and a lot of small RDDs b, c, d and so on.
> and A.update(a) will update some element in the A and return a new RDD
>
> when calling
> val B = A.update(b).update(c).update(d).update().....
> B.count()
>
> The count action will call the compute method.
> and each update will iterating the large rdd A.
> To avoid this I can merge these small rdds first to rdds then call
> A.update(rdds)
> But I don't hope to do this merge manually outside but inside RDD A
> automatically
>
> I hope I made it clear.
> ​
>
> On Fri, Apr 8, 2016 at 4:22 PM, Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> It seems like the union function on RDDs might be what you are looking
>> for, or was there something else you were trying to achieve?
>>
>>
>> On Thursday, April 7, 2016, Tenghuan He <tenghua...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I know that nested RDDs are not possible like linke rdd1.map(x => x +
>>> rdd2.count())
>>> I tried to create a custome RDD like following
>>>
>>> class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] {
>>>
>>> var rdds = new  ArrayBuffer.empty[RDD[(K, (V, Int))]]
>>> def update(rdd: RDD[_]) {
>>>   udds += rdd
>>> }
>>> def comput ...
>>> def getPartitions ...
>>> }
>>>
>>> In the compute method I call the internal rdds' iterators and got
>>> NullPointerException
>>> Is this also a form of nested RDDs and how do I get rid of this?
>>>
>>> Thanks.
>>>
>>>
>>> Tenghuan
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>>
>>
>

Reply via email to