Hi everyone, I had a quick question regarding our implementation of UnsafeHashedRelation and HashedRelation<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala>. It appears that we copy the rows that we’ve collected into memory upon inserting them into the hash table in UnsafeHashedRelation#apply(). I was wondering why we are copying the rows every time? I can’t imagine these rows being mutable in this scenario.
The context is that I’m looking into a case where a small data frame should fit in the driver’s memory, but my driver ran out of memory after I increased the autoBroadcastJoinThreshold. YourKit is indicating that this logic is consuming more memory than my driver can handle. Thanks, -Matt Cheah