Hi Reynold, Can you please elaborate on this. I thought RDD also opens only an iterator. Does it get materialized for joins?
Rishi On Saturday, September 19, 2015, Reynold Xin <r...@databricks.com> wrote: > Yes for RDD -- both are materialized. No for DataFrame/SQL - one side > streams. > > > On Thu, Sep 17, 2015 at 11:21 AM, Koert Kuipers <ko...@tresata.com > <javascript:_e(%7B%7D,'cvml','ko...@tresata.com');>> wrote: > >> in scalding we join with the smaller side on the left, since the smaller >> side will get buffered while the bigger side streams through the join. >> >> looking at CoGroupedRDD i do not get the impression such a distiction is >> made. it seems both sided are put into a map that can spill to disk. is >> this correct? >> >> thanks >> > >