[ 
https://issues.apache.org/jira/browse/SPARK-46383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46383.
---------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 44321
[https://github.com/apache/spark/pull/44321]

> Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()`
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-46383
>                 URL: https://issues.apache.org/jira/browse/SPARK-46383
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Utkarsh Agarwal
>            Assignee: Utkarsh Agarwal
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: Screenshot 2023-11-06 at 3.56.26 PM.png, screenshot-1.png
>
>
> `AccumulableInfo` is one of the top heap consumers in driver's heap dumps for 
> stages with many tasks. For a stage with a large number of tasks 
> ({_}O(100k){_}), we saw {*}{{*}}30%{{*}}{*} of the heap usage stemming from 
> `TaskInfo.accumulables()`.
> !screenshot-1.png|width=641,height=98!  
> The `TaskSetManager` today keeps around the TaskInfo objects 
> ([ref1|https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L134],
>  
> [ref2|https://github.com/apache/spark/blob/c1ba963e64a22dea28e17b1ed954e6d03d38da1e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L192]))
>  and in turn the task metrics (`AccumulableInfo`) for every task attempt 
> until the stage is completed. This means that for stages with a large number 
> of tasks, we keep metrics for all the tasks (`AccumulableInfo`) around even 
> when the task has completed and its metrics have been aggregated. Given a 
> task has a large number of metrics, stages with many tasks end up with a 
> large heap usage in the form of task metrics.
> Ideally, we should clear up a task's TaskInfo upon the task's completion, 
> thereby reducing the driver's heap usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to