Hi Soumyadeep, Both the configs are more or less the same. However, sql.adaptive.auto* config is applicable (starting from version 3.2.0) only in adaptive framework
As per the doc, default value for " spark.sql.adaptive.autoBroadcastJoinThreshold" is same with " spark.sql.autoBroadcastJoinThreshold". Here are the answers to your questions 1. If you set an upper limit to the autoBroadcastThreshold, broadcast won't mostly exceed that. However, if BROADCAST hint is explicitly specified on a table / dataframe in join, the limit will not be considered. 2. Broadcast will be disabled when you set the value to "-1". Hope it helps. Thanks, Bala On Sun, Jan 22, 2023, 6:36 PM Soumyadeep Mukhopadhyay <soumyamy...@gmail.com> wrote: > Hello! > > In my use case we are using PySpark 3.1 and there are a few pyspark > scripts that are running better with higher driver memory. > > As far as I know the default value of > "spark.sql.autoBroadcastJoinThreshold" is 10MB, there are a few cases where > the default driver configuration was throwing OOM errors so it was > necessary to add more driver memory. (basic configuration - driver-1 core > and 2GB memory, (executor-2 cores and 6 GB memory) * 2, i.e 2 executors.) > > I have 2 questions: > - I want to set an upper limit on the broadcast dataframe size, i.e., I do > not want the broadcast size to exceed 10MB. Does this setting > spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 10485760) > guarantee that? > - Instead of the above setting, does the following setting > spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) > spark.conf.set("spark.sql.adaptive.autoBroadcastJoinThreshold", 10485760) > provide any benefit? > > Any insight will be helpful! > > With regards, > Soumyadeep. >