Ryan Blue created SPARK-22170:
---------------------------------

             Summary: Broadcast join holds an extra copy of rows in driver 
memory
                 Key: SPARK-22170
                 URL: https://issues.apache.org/jira/browse/SPARK-22170
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0, 2.1.1, 2.0.2
            Reporter: Ryan Blue


I investigated a driver OOM that was building a large broadcast table with a 
memory profiler and found that a huge amount of memory is used while building a 
broadcast table. This is because [BroadcastExchangeExec uses 
{{executeCollect}}|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L76].
 In {{executeCollect}}, all of the partitions are fetched as compressed blocks, 
then each block is decompressed (with a stream), and each row is copied to a 
new byte buffer and added to an ArrayBuffer, which is copied to an Array. This 
results in a huge amount of allocation: a buffer for each row in the broadcast. 
Those rows are only used to get copied into a {{BytesToBytesMap}} that will be 
broadcasted, so there is no need to keep them in memory.

Replacing the array buffer step with an iterator reduces the amount of memory 
held while creating the map by not requiring all rows to be in memory. It also 
avoids allocating a large Array for the rows. In practice, a 16MB broadcast 
table used 100MB less memory with this approach, but the reduction depends on 
the size of rows and compression (16MB was in Parquet format).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to