[ 
https://issues.apache.org/jira/browse/SPARK-32470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32470:
----------------------------------
    Affects Version/s: 2.4.6
                       3.0.0

> Remove task result size check for shuffle map stage
> ---------------------------------------------------
>
>                 Key: SPARK-32470
>                 URL: https://issues.apache.org/jira/browse/SPARK-32470
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.6, 3.0.0, 3.1.0
>            Reporter: Wei Xue
>            Priority: Minor
>
> The task result of a shuffle map stage is not the query result but instead is 
> only map status and metrics accumulator updates. Aside from the metrics that 
> can vary in size, the total task result size solely depends on the number of 
> tasks. And the number of tasks can get large regardless of the stage's output 
> size. For example, the number of tasks generated by `CartesianProduct` is 
> square of "spark.sql.shuffle.partitions", say if 
> "spark.sql.shuffle.partitions" is set to 200, you get 40,000 tasks, if set to 
> 500, you get 250,000 tasks, which can easily error on the default limit of 
> `spark.driver.maxResultSize`:
>  
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Total size 
> of serialized results of 66496 tasks (4.0 GiB) is bigger than 
> spark.driver.maxResultSize (4.0 GiB)
> {code}
>  
> However, map status and accumulator updates are used by the driver to update 
> the overall map stats and metrics of the query, and they are not cached on 
> the driver, so they won't cause catastrophic memory issues on the driver. So 
> we should remove this check for shuffle map stage tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to