There's no real way of doing nested for-loops with RDD's because the whole
idea is that you could have so much data in the RDD that it would be really
ugly to store it all in one worker.

There are, however, ways to handle what you're asking about.

I would personally use something like CoGroup or Join between the two RDDs.
if index matters, you can use ZipWithIndex on both before you join and then
see which indexes match up.

On Mon, Aug 15, 2016 at 1:15 PM Eric Ho <e...@analyticsmd.com> wrote:

> I've nested foreach loops like this:
>
>   for i in A[i] do:
>     for j in B[j] do:
>       append B[j] to some list if B[j] 'matches' A[i] in some fashion.
>
> Each element in A or B is some complex structure like:
> (
>   some complex JSON,
>   some number
> )
>
> Question: if A and B were represented as RRDs (e.g. RRD(A) and RRD(B)),
> how would my code look ?
> Are there any RRD operators that would allow me to loop thru both RRDs
> like the above procedural code ?
> I can't find any RRD operators nor any code fragments that would allow me
> to do this.
>
> Thing is: by that time I composed RRD(A), this RRD would have contain
> elements in array B as well as array A.
> Same argument for RRD(B).
>
> Any pointers much appreciated.
>
> Thanks.
>
>
> --
>
> -eric ho
>
>

Reply via email to