liupengcheng created SPARK-30713: ------------------------------------ Summary: Respect mapOutputSize in memory in adaptive execution Key: SPARK-30713 URL: https://issues.apache.org/jira/browse/SPARK-30713 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: liupengcheng
Currently, Spark adaptive execution use the MapOutputStatistics information to adjust the plan dynamically, but this MapOutputSize does not respect the compression factor. So there are cases that the original SparkPlan is `SortMergeJoin`, but the Plan after adaptive adjustment was changed to `BroadcastHashJoin`, but this `BroadcastHashJoin` might causing OOMs due to inaccurate estimation. Also, if the shuffle implementation is local shuffle(intel Spark-Adaptive execution impl), then in some cases, it will cause `Too large Frame` exception. So I propose to respect the compression factor in adaptive execution, or use `dataSize` metrics in `ShuffleExchangeExec` in adaptive execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org