[ https://issues.apache.org/jira/browse/SPARK-40379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601386#comment-17601386 ]
Apache Spark commented on SPARK-40379: -------------------------------------- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/37821 > Propagate decommission executor loss reason during onDisconnect in K8s > ---------------------------------------------------------------------- > > Key: SPARK-40379 > URL: https://issues.apache.org/jira/browse/SPARK-40379 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core > Affects Versions: 3.4.0 > Reporter: Holden Karau > Assignee: Holden Karau > Priority: Minor > > Currently if an executor has been sent a decommission message and then it > disconnects from the scheduler we only disable the executor depending on the > K8s status events to drive the rest of the state transitions. However, the > K8s status events can become overwhelmed on large clusters so we should check > if an executor is in a decommissioning state when it is disconnected and use > that reason instead of waiting on the K8s status events so we have more > accurate logging information. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org