[ https://issues.apache.org/jira/browse/SPARK-24751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647757#comment-16647757 ]
Stavros Kontopoulos edited comment on SPARK-24751 at 10/12/18 10:37 AM: ------------------------------------------------------------------------ If an executor dies (eg. killing it) the backend will re-launch another trying to reach the requested value of executors. $ kubectl get pods -n spark NAME READY STATUS RESTARTS AGE test-cpus-1539340474549-driver 1/1 Running 0 13s test-cpus-1539340474549-exec-1 1/1 Running 0 5s test-cpus-1539340474549-exec-2 1/1 Running 0 4s test-cpus-1539340474549-exec-3 1/1 Running 0 4s test-cpus-1539340474549-exec-4 1/1 Running 0 4s $ kubectl delete pods test-cpus-1539340474549-exec-4 -n spark pod "test-cpus-1539340474549-exec-4" deleted $ kubectl get pods -n spark NAME READY STATUS RESTARTS AGE test-cpus-1539340474549-driver 1/1 Running 0 32s test-cpus-1539340474549-exec-1 1/1 Running 0 24s test-cpus-1539340474549-exec-2 1/1 Running 0 23s test-cpus-1539340474549-exec-3 1/1 Running 0 23s test-cpus-1539340474549-exec-5 1/1 Running 0 8s Do you mean something else? was (Author: skonto): If an executor dies (eg. killing it) the backend will re-launch another trying to reach the request value of executors. $ kubectl get pods -n spark NAME READY STATUS RESTARTS AGE test-cpus-1539340474549-driver 1/1 Running 0 13s test-cpus-1539340474549-exec-1 1/1 Running 0 5s test-cpus-1539340474549-exec-2 1/1 Running 0 4s test-cpus-1539340474549-exec-3 1/1 Running 0 4s test-cpus-1539340474549-exec-4 1/1 Running 0 4s $ kubectl delete pods test-cpus-1539340474549-exec-4 -n spark pod "test-cpus-1539340474549-exec-4" deleted $ kubectl get pods -n spark NAME READY STATUS RESTARTS AGE test-cpus-1539340474549-driver 1/1 Running 0 32s test-cpus-1539340474549-exec-1 1/1 Running 0 24s test-cpus-1539340474549-exec-2 1/1 Running 0 23s test-cpus-1539340474549-exec-3 1/1 Running 0 23s test-cpus-1539340474549-exec-5 1/1 Running 0 8s Do you mean something else? > [k8s] Relaunch failed executor pods > ------------------------------------ > > Key: SPARK-24751 > URL: https://issues.apache.org/jira/browse/SPARK-24751 > Project: Spark > Issue Type: Improvement > Components: Kubernetes > Affects Versions: 2.3.1 > Reporter: Dharmesh Kakadia > Priority: Major > Labels: kubernetes > > Currently, we don't create new executor pods to replace the failed once. This > is very useful resiliency. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org