Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
gt; the joined fileds. > > You shouldn't see any more shuffle if it works. > > Yong > > -- > Date: Wed, 6 Apr 2016 22:11:38 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@hotmail.com > CC: user@

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
on the joined fileds. You shouldn't see any more shuffle if it works. Yong Date: Wed, 6 Apr 2016 22:11:38 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks for the information. When I mention map side join. I meant

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
----- > Date: Wed, 6 Apr 2016 21:03:16 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@hotmail.com > CC: user@spark.apache.org > > Thanks a lot for this. I was thinking of using cogrouped RDDs. We will try > to move to

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
is much big, then you want to try map join. But you already partitioned both DFs, why you want to map-side join then? Yong Date: Wed, 6 Apr 2016 21:03:16 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks a lot

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
correct in this > case), but I think spark will sort both DFs again, even you already > partitioned them. > > Yong > > ------ > Date: Wed, 6 Apr 2016 20:10:14 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com >

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
. If this is wrong, please let me know. The execution plan is in fact doing SortMerge (which is correct in this case), but I think spark will sort both DFs again, even you already partitioned them. Yong Date: Wed, 6 Apr 2016 20:10:14 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
utput here, so > we can better help you. > > Yong > > > Date: Tue, 5 Apr 2016 09:46:59 -0700 > > From: darshan.m...@gmail.com > > To: user@spark.apache.org > > Subject: Plan issue with spark 1.5.2 > > > > > > I am using spark 1.5.2. I have a quest

RE: Plan issue with spark 1.5.2

2016-04-05 Thread Yong Zhang
:59 -0700 > From: darshan.m...@gmail.com > To: user@spark.apache.org > Subject: Plan issue with spark 1.5.2 > > > I am using spark 1.5.2. I have a question regarding plan generated by spark. > I have 3 data-frames which has the data for different countries. I have > around 150 c

Plan issue with spark 1.5.2

2016-04-05 Thread dsing001
I am using spark 1.5.2. I have a question regarding plan generated by spark. I have 3 data-frames which has the data for different countries. I have around 150 countries and data is skewed. My 95% queries will have country as criteria. However, I have seen issues with the plans generated for