Re: Mapper side join with DataFrames API

2016-03-05 Thread Deepak Gopalakrishnan
rying to find out the root cause yet. >>> >>> Yong >>> >>> -- >>> Date: Wed, 2 Mar 2016 15:38:29 +0530 >>> Subject: Re: Mapper side join with DataFrames API >>> From: dgk...@gmail.com >>> To:

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
mall. >> >> I am still trying to find out the root cause yet. >> >> Yong >> >> ---------- >> Date: Wed, 2 Mar 2016 15:38:29 +0530 >> Subject: Re: Mapper side join with DataFrames API >> From: dgk...@gmail.com >> To: mich...@databricks.com >&g

Re: Mapper side join with DataFrames API

2016-03-02 Thread Deepak Gopalakrishnan
aframe, and it caused > OOM for me simple test case, even one side of join is very small. > > I am still trying to find out the root cause yet. > > Yong > > -- > Date: Wed, 2 Mar 2016 15:38:29 +0530 > Subject: Re: Mapper side join with Da

Re: Mapper side join with DataFrames API

2016-03-01 Thread Michael Armbrust
Its helpful to always include the output of df.explain(true) when you are asking about performance. On Mon, Feb 29, 2016 at 6:14 PM, Deepak Gopalakrishnan wrote: > Hello All, > > I'm trying to join 2 dataframes A and B with a > > sqlContext.sql("SELECT * FROM A INNER JOIN B ON

Fwd: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
Hello All, I'm trying to join 2 dataframes A and B with a sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a"); Now what I have done is that I have registeredTempTables for A and B after loading these DataFrames from different sources. I need the join to be really fast and I was wondering