Ke Jia created SPARK-28046:
------------------------------

             Summary: OOM caused by building hash table when the compressed 
ratio of small table is normal
                 Key: SPARK-28046
                 URL: https://issues.apache.org/jira/browse/SPARK-28046
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.1
            Reporter: Ke Jia


Currently, spark will convert the sort merge join to broadcast hash join when 
the small table compressed  size <= the broadcast threshold.  Same with Spark, 
AE also convert the smj to bhj based on the compressed size in runtime.  In our 
test, when enable ae with 32M broadcast threshold, one smj with 16M compressed 
size is converted to bhj. However, when building the hash table, the 16M small 
table is decompressed with 2GB size and has 134485048 row count, which has a 
mount of continuous and repeated values. Therefore, the following OOM exception 
occurs when building hash table:

!image-2019-06-14-10-29-00-499.png!

And based on this founding , it may be not reasonable to decide whether smj be 
converted to bhj only by the compressed size. In AE, we add the condition with 
the estimation  decompressed size based on the row counts. And in spark, we may 
also need the decompressed size or row counts condition judgment not only the 
compressed size when converting the smj to bhj.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to