Re: Why spark.sql.autoBroadcastJoinThreshold not available

Jone Zhang Mon, 15 May 2017 18:15:07 -0700

Solve it by remove lazy identity.
2.HiveContext.sql("cache table feature as "select * from src where ...)
which result size is only 100K


Thanks!

2017-05-15 21:26 GMT+08:00 Yong Zhang <java8...@hotmail.com>:

> You should post the execution plan here, so we can provide more accurate
> support.
>
>
> Since in your feature table, you are building it with projection ("where
> ...."), so my guess is that the following JIRA (SPARK-13383
> <https://issues.apache.org/jira/browse/SPARK-13383>) stops the broadcast
> join. This is fixed in the Spark 2.x. Can you try it on Spark 2.0?
>
> Yong
>
> ------------------------------
> *From:* Jone Zhang <joyoungzh...@gmail.com>
> *Sent:* Wednesday, May 10, 2017 7:10 AM
> *To:* user @spark/'user @spark'/spark users/user@spark
> *Subject:* Why spark.sql.autoBroadcastJoinThreshold not available
>
> Now i use spark1.6.0 in java
> I wish the following sql to be executed in BroadcastJoin way
> *select * from sample join feature*
>
> This is my step
> 1.set spark.sql.autoBroadcastJoinThreshold=100M
> 2.HiveContext.sql("cache lazy table feature as "select * from src where
> ...) which result size is only 100K
> 3.HiveContext.sql("select * from sample join feature")
> Why the join is SortMergeJoin?
>
> Grateful for any idea!
> Thanks.
>

Re: Why spark.sql.autoBroadcastJoinThreshold not available

Reply via email to