[ https://issues.apache.org/jira/browse/SPARK-40379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648756#comment-17648756 ]
Dongjoon Hyun commented on SPARK-40379: --------------------------------------- Hi, [~holden]. We want to go GA with `Dynamic Allocation on K8s`. I collected this individual task there as a subtask because this is good. Please let me know if you want to collect this into somewhere else. > Propagate decommission executor loss reason during onDisconnect in K8s > ---------------------------------------------------------------------- > > Key: SPARK-40379 > URL: https://issues.apache.org/jira/browse/SPARK-40379 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core > Affects Versions: 3.4.0 > Reporter: Holden Karau > Assignee: Holden Karau > Priority: Minor > Fix For: 3.4.0 > > > Currently if an executor has been sent a decommission message and then it > disconnects from the scheduler we only disable the executor depending on the > K8s status events to drive the rest of the state transitions. However, the > K8s status events can become overwhelmed on large clusters so we should check > if an executor is in a decommissioning state when it is disconnected and use > that reason instead of waiting on the K8s status events so we have more > accurate logging information. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org