in
JIRA SPARK-13383
Thanks
Yong
From: java8...@hotmail.com
To: dav...@databricks.com
CC: user@spark.apache.org
Subject: RE: Spark 1.5.2, why the broadcast join shuffle so much data in the
last step
Date: Wed, 23 Mar 2016 20:30:42 -0400
Sounds good.
I will manual merge this patch on 1.6.1
Sounds good.
I will manual merge this patch on 1.6.1, and test again for my case tomorrow on
my environment and will update later.
Thanks
Yong
> Date: Wed, 23 Mar 2016 16:20:23 -0700
> Subject: Re: Spark 1.5.2, why the broadcast join shuffle so much data in the
> last step
&g
gt; [date_time#25L,visid_low#461L,visid_high#460L,account_id#976]
> > +- Project [soid_e1#30 AS
> > account_id#976,visid_high#460L,visid_low#461L,date_time#25L,ip#127]
> > +- Filter (instr(event_list#105,202) > 0)
> > +-
;broadcast" join supposed to do,
>> > correct?
>> > In the last stage, it will be very slow, until it reach and process all
>> > the history data, shown below as "shuffle read" reaching 720G, to finish.
>> >
>> >
>> >
>>
do
anything with "broadcast" problem.
Thanks
Yong
> Date: Wed, 23 Mar 2016 10:14:19 -0700
> Subject: Re: Spark 1.5.2, why the broadcast join shuffle so much data in the
> last step
> From: dav...@databricks.com
> To: java8...@hotmail.com
> CC: user@spark.apache.org
>
tly same input. It is an interesting point, does
> anyone have some idea about this?
>
>
> Overall, for my test case, "broadcast" join is the exactly most optimized way
> I should use; but somehow, I cannot make it do the same way as "mapjoin" of
> Hive