Hello. 



If you want to allow broadcast join with larger broadcasts you can set 
spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the 
plan to allow join despite 'A' being larger than the default threshold. 




Get Outlook for Android







From: paleyl


Sent: Wednesday, June 28, 10:42 PM


Subject: about broadcast join of base table in spark sql


To: d...@spark.org, user@spark.apache.org






Hi All,






Recently I meet a problem in broadcast join: I want to left join table A and B, 
A is the smaller one and the left table, so I wrote 




A = A.join(B,A("key1") === B("key2"),"left")




but I found that A is not broadcast out, as the shuffle size is still very 
large.




I guess this is a designed mechanism in spark, so could anyone please tell me 
why it is designed like this? I am just very curious.






Best,






Paley 








Reply via email to