Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
gt; the joined fileds. > > You shouldn't see any more shuffle if it works. > > Yong > > -- > Date: Wed, 6 Apr 2016 22:11:38 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@hotmail.com > CC: user@

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
on the joined fileds. You shouldn't see any more shuffle if it works. Yong Date: Wed, 6 Apr 2016 22:11:38 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks for the information. When I mention map side join. I meant

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
--------- > Date: Wed, 6 Apr 2016 21:03:16 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@hotmail.com > CC: user@spark.apache.org > > Thanks a lot for this. I was thinking of using cogrouped RDDs. We will try > to move to

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
is much big, then you want to try map join. But you already partitioned both DFs, why you want to map-side join then? Yong Date: Wed, 6 Apr 2016 21:03:16 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks a lot

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
correct in this > case), but I think spark will sort both DFs again, even you already > partitioned them. > > Yong > > ------ > Date: Wed, 6 Apr 2016 20:10:14 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com >

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
. If this is wrong, please let me know. The execution plan is in fact doing SortMerge (which is correct in this case), but I think spark will sort both DFs again, even you already partitioned them. Yong Date: Wed, 6 Apr 2016 20:10:14 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
I used 1.5.2.I have used movies data to reproduce the issue. Below is the physical plan. I am not sure why it is hash partitioning the data and then sort and then join. I expect the data to be joined first and then send for further processing. I sort of expect a common partitioner which will work

RE: Plan issue with spark 1.5.2

2016-04-05 Thread Yong Zhang
You need to show us the execution plan, so we can understand what is your issue. Use the spark shell code to show how your DF is built, how you partition them, then use explain(true) on your join DF, and show the output here, so we can better help you. Yong > Date: Tue, 5 Apr 2016 09:46:59