Solve it by remove lazy identity. 2.HiveContext.sql("cache table feature as "select * from src where ...) which result size is only 100K
Thanks! 2017-05-15 21:26 GMT+08:00 Yong Zhang <java8...@hotmail.com>: > You should post the execution plan here, so we can provide more accurate > support. > > > Since in your feature table, you are building it with projection ("where > ...."), so my guess is that the following JIRA (SPARK-13383 > <https://issues.apache.org/jira/browse/SPARK-13383>) stops the broadcast > join. This is fixed in the Spark 2.x. Can you try it on Spark 2.0? > > Yong > > ------------------------------ > *From:* Jone Zhang <joyoungzh...@gmail.com> > *Sent:* Wednesday, May 10, 2017 7:10 AM > *To:* user @spark/'user @spark'/spark users/user@spark > *Subject:* Why spark.sql.autoBroadcastJoinThreshold not available > > Now i use spark1.6.0 in java > I wish the following sql to be executed in BroadcastJoin way > *select * from sample join feature* > > This is my step > 1.set spark.sql.autoBroadcastJoinThreshold=100M > 2.HiveContext.sql("cache lazy table feature as "select * from src where > ...) which result size is only 100K > 3.HiveContext.sql("select * from sample join feature") > Why the join is SortMergeJoin? > > Grateful for any idea! > Thanks. >