Hello! In my use case we are using PySpark 3.1 and there are a few pyspark scripts that are running better with higher driver memory.
As far as I know the default value of "spark.sql.autoBroadcastJoinThreshold" is 10MB, there are a few cases where the default driver configuration was throwing OOM errors so it was necessary to add more driver memory. (basic configuration - driver-1 core and 2GB memory, (executor-2 cores and 6 GB memory) * 2, i.e 2 executors.) I have 2 questions: - I want to set an upper limit on the broadcast dataframe size, i.e., I do not want the broadcast size to exceed 10MB. Does this setting spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 10485760) guarantee that? - Instead of the above setting, does the following setting spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) spark.conf.set("spark.sql.adaptive.autoBroadcastJoinThreshold", 10485760) provide any benefit? Any insight will be helpful! With regards, Soumyadeep.