Re: Dataframe broadcast join hint not working

2016-11-28 Thread Yong Zhang
ychyi <anton.okolnyc...@gmail.com> Sent: Saturday, November 26, 2016 4:05 PM To: Swapnil Shinde Cc: Benyi Wang; user@spark.apache.org Subject: Re: Dataframe broadcast join hint not working Hi guys, I also experienced a situation when Spark 1.6.2 ignored my hint to do a broadca

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Anton Okolnychyi
Hi guys, I also experienced a situation when Spark 1.6.2 ignored my hint to do a broadcast join (i.e. broadcast(df)) with a small dataset. However, this happened only in 1 of 3 cases. Setting the "spark.sql.autoBroadcastJoinThreshold" property did not have any impact as well. All 3 cases work

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Benyi Wang
I think your dataframes are converted from RDDs, Are those RDDs computed or read from files directly? I guess it might affect how spark compute the execution plan. Try this: save your data frame which will be broadcasted to HDFS, and read it back into a dataframe. Then do the join and check the

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Swapnil Shinde
I am using Spark 1.6.3 and below is the real plan (a,b,c in above were just for illustration purpose) == Physical Plan == Project [ltt#3800 AS ltt#3814,CASE WHEN isnull(etv_demo_id#3813) THEN mr_demo_id#3801 ELSE etv_demo_id#3813 AS etv_demo_id#3815] +- SortMergeOuterJoin

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Benyi Wang
Could you post the result of explain `c.explain`? If it is broadcast join, you will see it in explain. On Sat, Nov 26, 2016 at 10:51 AM, Swapnil Shinde wrote: > Hello > I am trying a broadcast join on dataframes but it is still doing > SortMergeJoin. I even try

Re: Dataframe broadcast join hint not working

2016-11-26 Thread Selvam Raman
Hi, Which version of spark you are using. Less than 10Mb automatically converted as broadcast join in spark. \Thanks, selvam R On Sat, Nov 26, 2016 at 6:51 PM, Swapnil Shinde wrote: > Hello > I am trying a broadcast join on dataframes but it is still doing >

Dataframe broadcast join hint not working

2016-11-26 Thread Swapnil Shinde
Hello I am trying a broadcast join on dataframes but it is still doing SortMergeJoin. I even try setting spark.sql.autoBroadcastJoinThreshold higher but still no luck. Related piece of code- val c = a.join(braodcast(b), "id") On a side note, if I do SizeEstimator.estimate(b) and it