[ 
https://issues.apache.org/jira/browse/SPARK-28046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865236#comment-16865236
 ] 

Dongjoon Hyun commented on SPARK-28046:
---------------------------------------

In general, this `LongToUnsafeRowMap` issue is the same one with SPARK-24912 

> OOM caused by building hash table when the compressed ratio of small table is 
> normal
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-28046
>                 URL: https://issues.apache.org/jira/browse/SPARK-28046
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Ke Jia
>            Priority: Major
>         Attachments: image-2019-06-14-10-34-53-379.png
>
>
> Currently, spark will convert the sort merge join to broadcast hash join when 
> the small table compressed  size <= the broadcast threshold.  Same with 
> Spark, AE also convert the smj to bhj based on the compressed size in 
> runtime.  In our test, when enable ae with 32M broadcast threshold, one smj 
> with 16M compressed size is converted to bhj. However, when building the hash 
> table, the 16M small table is decompressed with 2GB size and has 134485048 
> row count, which has a mount of continuous and repeated values. Therefore, 
> the following OOM exception occurs when building hash table:
> !image-2019-06-14-10-34-53-379.png!
> And based on this founding , it may be not reasonable to decide whether smj 
> be converted to bhj only by the compressed size. In AE, we add the condition 
> with the estimation  decompressed size based on the row counts. And in spark, 
> we may also need the decompressed size or row counts condition judgment not 
> only the compressed size when converting the smj to bhj.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to