liupengcheng created SPARK-30713:
------------------------------------

             Summary: Respect mapOutputSize in memory in adaptive execution
                 Key: SPARK-30713
                 URL: https://issues.apache.org/jira/browse/SPARK-30713
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: liupengcheng


Currently, Spark adaptive execution use the MapOutputStatistics information to 
adjust the plan dynamically, but this MapOutputSize does not respect the 
compression factor. So there are cases that the original SparkPlan is 
`SortMergeJoin`, but the Plan after adaptive adjustment was changed to 
`BroadcastHashJoin`, but this `BroadcastHashJoin` might causing OOMs due to 
inaccurate estimation.

 

Also, if the shuffle implementation is local shuffle(intel Spark-Adaptive 
execution impl), then in some cases, it will cause `Too large Frame` exception.

 

So I propose to respect the compression factor in adaptive execution, or use 
`dataSize` metrics in `ShuffleExchangeExec` in adaptive execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to