Jialin LIu created SPARK-26197:
----------------------------------

             Summary: Spark master fails to detect driver process pause
                 Key: SPARK-26197
                 URL: https://issues.apache.org/jira/browse/SPARK-26197
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.2
            Reporter: Jialin LIu


I was using Spark 2.3.2 with standalone cluster and submit job using cluster 
mode. After I submit the job, I deliberately pause the driver process 
(throughout shell command "kill -stop (driver process id) ") to see if the 
master can detect this problem. The result shows that the driver will never 
stop. All the executors will try to talk back to driver and will give up in 10 
minutes. Master can detect executor failures and try to reassign new executor 
process to redo the job. New executor will try to create RPC connection with 
driver and will fail in 2 minutes. Master will endlessly spawn new executors 
without detecting driver failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to