Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-10 Thread V0lleyBallJunki3
So what I discovered was that if I write the table being joined to the disk
and then read it again Spark correctly broadcasts it. I think it is because
when Spark estimates the size of smaller table it estimates it incorrectly
to be much bigger that what it is and hence decides to do a SortMergeJoin on
it. Writing it to the disk and then reading it back again gives Spark the
correct size and hence it then goes ahead and does a Broadcast join.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

2019-05-09 Thread V0lleyBallJunki3
I have a small table well below 50 MB that I want to broadcast join with a
larger table. However, if I set spark.sql.autoBroadcastJoinThreshold to 100
MB spark still decides to do a SortMergeJoin instead of a broadcast join. I
have to set an explicit broadcast hint on the table for it to do the
broadcast join but I don't want to do that because the smaller table might
be bigger than 100 MB in which case I want it to fall back to SortMergeJoin.
Are there any other properties that I need to set?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org