Re: Any advantages of using sql.adaptive.autoBroadcastJoinThreshold over sql.autoBroadcastJoinThreshold?

Balakrishnan Ayyappan Sun, 22 Jan 2023 08:02:23 -0800

Hi Soumyadeep,

Both the configs are more or less the same. However, sql.adaptive.auto*
config is applicable (starting from version 3.2.0)
only in adaptive framework

As per the doc, default value for  "
spark.sql.adaptive.autoBroadcastJoinThreshold" is same with "
spark.sql.autoBroadcastJoinThreshold".

Here are the answers to your questions

1. If you set an upper limit to the autoBroadcastThreshold, broadcast won't
mostly exceed that. However, if BROADCAST hint is explicitly specified on a
table / dataframe in join, the limit will not be considered.

2. Broadcast will be disabled when you set the value to "-1".

Hope it helps.

Thanks,
Bala

On Sun, Jan 22, 2023, 6:36 PM Soumyadeep Mukhopadhyay <soumyamy...@gmail.com>
wrote:

> Hello!
>
> In my use case we are using PySpark 3.1 and there are a few pyspark
> scripts that are running better with higher driver memory.
>
> As far as I know the default value of
> "spark.sql.autoBroadcastJoinThreshold" is 10MB, there are a few cases where
> the default driver configuration was throwing OOM errors so it was
> necessary to add more driver memory. (basic configuration - driver-1 core
> and 2GB memory, (executor-2 cores and 6 GB memory) * 2, i.e 2 executors.)
>
> I have 2 questions:
> - I want to set an upper limit on the broadcast dataframe size, i.e., I do
> not want the broadcast size to exceed 10MB. Does this setting
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 10485760)
> guarantee that?
> - Instead of the above setting, does the following setting
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
> spark.conf.set("spark.sql.adaptive.autoBroadcastJoinThreshold", 10485760)
> provide any benefit?
>
> Any insight will be helpful!
>
> With regards,
> Soumyadeep.
>

Re: Any advantages of using sql.adaptive.autoBroadcastJoinThreshold over sql.autoBroadcastJoinThreshold?

Reply via email to