dzcxzl created SPARK-33158: ------------------------------ Summary: Check whether the executor and external service connection is available Key: SPARK-33158 URL: https://issues.apache.org/jira/browse/SPARK-33158 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.1 Reporter: dzcxzl
At present, the executor only establishes a connection with the external shuffle service once at initialization and registers. In yarn, nodemanager may stop working, shuffle service does not work, but the container/executor process is still executing, ShuffleMapTask can still be executed, and the returned result mapstatus is still the address of the external shuffle service When the next stage reads shuffle data, it will not be connected to the shuffle serivce. The final job execution failed. The approach I thought of: Before ShuffleMapTask starts to write data, check whether the connection is available, or regularly test whether the connection is normal, such as the driver and executor heartbeat check threads. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org