Re: FullOuterJoin on Spark

2016-06-22 Thread Gourav Sengupta
) increase shuffle.memoryfraction > > you can use dataframes with spark 1.6 or greater to further reduce memory > footprint. I haven't tested that though. > > > On Tue, Jun 21, 2016 at 6:16 AM, Rychnovsky, Dusan < > dusan.rychnov...@firma.seznam.cz> wrote: > >> Hi, >

Re: FullOuterJoin on Spark

2016-06-22 Thread Nirav Patel
AM, Rychnovsky, Dusan < dusan.rychnov...@firma.seznam.cz> wrote: > Hi, > > > can somebody please explain the way FullOuterJoin works on Spark? Does > each intersection get fully loaded to memory? > > My problem is as follows: > > > I have two large da

FullOuterJoin on Spark

2016-06-21 Thread Rychnovsky, Dusan
Hi, can somebody please explain the way FullOuterJoin works on Spark? Does each intersection get fully loaded to memory? My problem is as follows: I have two large data-sets: * a list of web pages, * a list of domain-names with specific rules for processing pages from that domain. I am