If you are using Spark > 1.5, the best way is to use DataFrame API directly,
instead of SQL. In dataframe, you can specify the boardcast join hint in the
dataframe API, which will force the boardcast join.
Yong
From: mich.talebza...@gmail.com
Date: Mon, 20 Jun 2016 13:09:17 +0100
Subject: Re:
what sort of the tables are these?
Can you register the result set as temp table and do a join on that
assuming the RS is going to be small
s.filter(($"c2" < 1000)).registerTempTable("tmp")
and then do a join between tmp and Table2
HTH
Dr Mich Talebzadeh
LinkedIn *
Seems it is hard to predict the output size of filters because the current
spark has limited statistics of input data. A few hours ago, Reynold
created a ticket for cost-based optimizer framework in
https://issues.apache.org/jira/browse/SPARK-16026.
If you have ideas, questions, and suggestions,
Hi,
How about caching the result of `select * from a where a.c2 < 1000`, then
joining them?
You probably need to tune `spark.sql.autoBroadcastJoinThreshold` to enable
broadcast joins for the result table.
// maropu
On Mon, Jun 20, 2016 at 8:06 PM, 梅西0247 wrote:
> Hi