Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold
So what I discovered was that if I write the table being joined to the disk and then read it again Spark correctly broadcasts it. I think it is because when Spark estimates the size of smaller table it estimates it incorrectly to be much bigger that what it is and hence decides to do a SortMergeJoin on it. Writing it to the disk and then reading it back again gives Spark the correct size and hence it then goes ahead and does a Broadcast join. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold
I have a small table well below 50 MB that I want to broadcast join with a larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100 MB spark still decides to do a SortMergeJoin instead of a broadcast join. I have to set an explicit broadcast hint on the table for it to do the broadcast join but I don't want to do that because the smaller table might be bigger than 100 MB in which case I want it to fall back to SortMergeJoin. Are there any other properties that I need to set? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org