[jira] [Commented] (SPARK-40979) Keep removed executor info in decommission state

Dongjoon Hyun (Jira) Fri, 16 Dec 2022 10:53:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-40979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648757#comment-17648757
 ]


Dongjoon Hyun commented on SPARK-40979:
---------------------------------------

I collected this as a subtask of SPARK-41550

> Keep removed executor info in decommission state
> ------------------------------------------------
>
>                 Key: SPARK-40979
>                 URL: https://issues.apache.org/jira/browse/SPARK-40979
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Zhongwei Zhu
>            Assignee: Zhongwei Zhu
>            Priority: Major
>             Fix For: 3.4.0
>
>
> Removed executor due to decommission should be kept in a separate set. To 
> avoid OOM, set size will be limited to 1K or 10K.
> FetchFailed caused by decom executor could be divided into 2 categories:
>  # When FetchFailed reached DAGScheduler, the executor is still alive or is 
> lost but the lost info hasn't reached TaskSchedulerImpl. This is already 
> handled in SPARK-40979
>  # FetchFailed is caused by decom executor loss, so the decom info is already 
> removed in TaskSchedulerImpl. If we keep such info in a short period, it is 
> good enough. Even we limit the size of removed executors to 10K, it could be 
> only at most 10MB memory usage. In real case, it's rare to have cluster size 
> of over 10K and the chance that all these executors decomed and lost at the 
> same time would be small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40979) Keep removed executor info in decommission state

Reply via email to