gt; the joined fileds.
>
> You shouldn't see any more shuffle if it works.
>
> Yong
>
> --
> Date: Wed, 6 Apr 2016 22:11:38 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@hotmail.com
> CC: user@
on the
joined fileds.
You shouldn't see any more shuffle if it works.
Yong
Date: Wed, 6 Apr 2016 22:11:38 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks for the information. When I mention map side join. I meant
-----
> Date: Wed, 6 Apr 2016 21:03:16 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@hotmail.com
> CC: user@spark.apache.org
>
> Thanks a lot for this. I was thinking of using cogrouped RDDs. We will try
> to move to
is much big, then you want to try map join.
But you already partitioned both DFs, why you want to map-side join then?
Yong
Date: Wed, 6 Apr 2016 21:03:16 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks a lot
correct in this
> case), but I think spark will sort both DFs again, even you already
> partitioned them.
>
> Yong
>
> ------
> Date: Wed, 6 Apr 2016 20:10:14 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
>
. If this is
wrong, please let me know.
The execution plan is in fact doing SortMerge (which is correct in this case),
but I think spark will sort both DFs again, even you already partitioned them.
Yong
Date: Wed, 6 Apr 2016 20:10:14 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m
utput here, so
> we can better help you.
>
> Yong
>
> > Date: Tue, 5 Apr 2016 09:46:59 -0700
> > From: darshan.m...@gmail.com
> > To: user@spark.apache.org
> > Subject: Plan issue with spark 1.5.2
> >
> >
> > I am using spark 1.5.2. I have a quest
:59 -0700
> From: darshan.m...@gmail.com
> To: user@spark.apache.org
> Subject: Plan issue with spark 1.5.2
>
>
> I am using spark 1.5.2. I have a question regarding plan generated by spark.
> I have 3 data-frames which has the data for different countries. I have
> around 150 c
I am using spark 1.5.2. I have a question regarding plan generated by spark.
I have 3 data-frames which has the data for different countries. I have
around 150 countries and data is skewed.
My 95% queries will have country as criteria. However, I have seen issues
with the plans generated for