RE: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

2016-06-20 Thread Yong Zhang
If you are using Spark > 1.5, the best way is to use DataFrame API directly, instead of SQL. In dataframe, you can specify the boardcast join hint in the dataframe API, which will force the boardcast join. Yong From: mich.talebza...@gmail.com Date: Mon, 20 Jun 2016 13:09:17 +0100 Subject: Re:

Re: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

2016-06-20 Thread Mich Talebzadeh
what sort of the tables are these? Can you register the result set as temp table and do a join on that assuming the RS is going to be small s.filter(($"c2" < 1000)).registerTempTable("tmp") and then do a join between tmp and Table2 HTH Dr Mich Talebzadeh LinkedIn *

Re: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

2016-06-20 Thread Takeshi Yamamuro
Seems it is hard to predict the output size of filters because the current spark has limited statistics of input data. A few hours ago, Reynold created a ticket for cost-based optimizer framework in https://issues.apache.org/jira/browse/SPARK-16026. If you have ideas, questions, and suggestions,

Re: Is it possible to turn a SortMergeJoin into BroadcastHashJoin?

2016-06-20 Thread Takeshi Yamamuro
Hi, How about caching the result of `select * from a where a.c2 < 1000`, then joining them? You probably need to tune `spark.sql.autoBroadcastJoinThreshold` to enable broadcast joins for the result table. // maropu On Mon, Jun 20, 2016 at 8:06 PM, 梅西0247 wrote: > Hi