wuyi created SPARK-34245: ---------------------------- Summary: Master may not remove the finished executor when Worker fails to send ExecutorStateChanged Key: SPARK-34245 URL: https://issues.apache.org/jira/browse/SPARK-34245 Project: Spark Issue Type: Improvement Components: Deploy, Spark Core Affects Versions: 3.0.1, 2.4.7, 3.2.0, 3.1.1 Reporter: wuyi
If the Worker fails to send ExecutorStateChanged to the Master due to some errors, e.g., temporary network error, then the Master can't remove the finished executor normally and think the executor is still alive. In the worst case, if the executor is the only one executor for the application, the application can get hang. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org