gt; the joined fileds.
>
> You shouldn't see any more shuffle if it works.
>
> Yong
>
> --
> Date: Wed, 6 Apr 2016 22:11:38 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@hotmail.com
> CC: user@
on the
joined fileds.
You shouldn't see any more shuffle if it works.
Yong
Date: Wed, 6 Apr 2016 22:11:38 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks for the information. When I mention map side join. I meant
---------
> Date: Wed, 6 Apr 2016 21:03:16 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@hotmail.com
> CC: user@spark.apache.org
>
> Thanks a lot for this. I was thinking of using cogrouped RDDs. We will try
> to move to
is much big, then you want to try map join.
But you already partitioned both DFs, why you want to map-side join then?
Yong
Date: Wed, 6 Apr 2016 21:03:16 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks a lot
correct in this
> case), but I think spark will sort both DFs again, even you already
> partitioned them.
>
> Yong
>
> ------
> Date: Wed, 6 Apr 2016 20:10:14 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
>
. If this is
wrong, please let me know.
The execution plan is in fact doing SortMerge (which is correct in this case),
but I think spark will sort both DFs again, even you already partitioned them.
Yong
Date: Wed, 6 Apr 2016 20:10:14 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m
I used 1.5.2.I have used movies data to reproduce the issue. Below is the
physical plan. I am not sure why it is hash partitioning the data and then
sort and then join. I expect the data to be joined first and then send for
further processing.
I sort of expect a common partitioner which will work
You need to show us the execution plan, so we can understand what is your issue.
Use the spark shell code to show how your DF is built, how you partition them,
then use explain(true) on your join DF, and show the output here, so we can
better help you.
Yong
> Date: Tue, 5 Apr 2016 09:46:59