Jialin LIu created SPARK-26197: ---------------------------------- Summary: Spark master fails to detect driver process pause Key: SPARK-26197 URL: https://issues.apache.org/jira/browse/SPARK-26197 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: Jialin LIu
I was using Spark 2.3.2 with standalone cluster and submit job using cluster mode. After I submit the job, I deliberately pause the driver process (throughout shell command "kill -stop (driver process id) ") to see if the master can detect this problem. The result shows that the driver will never stop. All the executors will try to talk back to driver and will give up in 10 minutes. Master can detect executor failures and try to reassign new executor process to redo the job. New executor will try to create RPC connection with driver and will fail in 2 minutes. Master will endlessly spawn new executors without detecting driver failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org