Holden Karau created SPARK-40379:
------------------------------------

             Summary: Propagate decommission executor loss reason during 
onDisconnect in K8s
                 Key: SPARK-40379
                 URL: https://issues.apache.org/jira/browse/SPARK-40379
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes, Spark Core
    Affects Versions: 3.4.0
            Reporter: Holden Karau
            Assignee: Holden Karau


Currently if an executor has been sent a decommission message and then it 
disconnects from the scheduler we only disable the executor depending on the 
K8s status events to drive the rest of the state transitions. However, the K8s 
status events can become overwhelmed on large clusters so we should check if an 
executor is in a decommissioning state when it is disconnected and use that 
reason instead of waiting on the K8s status events so we have more accurate 
logging information.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to