Re: Why spark.sql.autoBroadcastJoinThreshold not available

Yong Zhang Mon, 15 May 2017 06:27:08 -0700

You should post the execution plan here, so we can provide more accurate 
support.



Since in your feature table, you are building it with projection ("where 
...."), so my guess is that the following JIRA 
(SPARK-13383<https://issues.apache.org/jira/browse/SPARK-13383>) stops the 
broadcast join. This is fixed in the Spark 2.x. Can you try it on Spark 2.0?

Yong

________________________________
From: Jone Zhang <joyoungzh...@gmail.com>
Sent: Wednesday, May 10, 2017 7:10 AM
To: user @spark/'user @spark'/spark users/user@spark
Subject: Why spark.sql.autoBroadcastJoinThreshold not available

Now i use spark1.6.0 in java
I wish the following sql to be executed in BroadcastJoin way
select * from sample join feature

This is my step
1.set spark.sql.autoBroadcastJoinThreshold=100M
2.HiveContext.sql("cache lazy table feature as "select * from src where ...) 
which result size is only 100K
3.HiveContext.sql("select * from sample join feature")
Why the join is SortMergeJoin?

Grateful for any idea!
Thanks.

Re: Why spark.sql.autoBroadcastJoinThreshold not available

Reply via email to