liupengcheng created SPARK-31202:
------------------------------------

             Summary: Improve SizeEstimator for AppendOnlyMap
                 Key: SPARK-31202
                 URL: https://issues.apache.org/jira/browse/SPARK-31202
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.2, 3.0.0
            Reporter: liupengcheng


Currently, spark's memory management depends on the size estimation for 
execution and storage. 
In our real cluster, users always meet the issue OOM due to the inaccurate size 
estimation for ` AppendOnlyMap`, that's because spark stores KV in an 
Array[AnyRef] in `AppendOnlyMap` for memory locality,  and this value can be 
CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like 
cogroup/join/groupBy, but current `SizeEstimator` will still treat this special 
array as an normal array, so in many cases, we noticed a great bias between the 
estimated size and the acutal memory consuption. 
So we improved this in xiaomi:
1. Improve the estimation for AppendOnlyMap when the value type is CompactBuffer
2. Respect jvm gc stats to decide whether to spilling when doing sort/agg

In this jira, I propose to solve the first part which is improving the 
estimation for `AppendOnlyMap` when the value type is CompactBuffer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to