[ 
https://issues.apache.org/jira/browse/SPARK-30055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17087857#comment-17087857
 ] 

Ed Mitchell commented on SPARK-30055:
-------------------------------------

I agree with this. Having Never defaulted limits the flexibility that allows 
Kubernetes to restart pods if they run out of memory or terminate in some 
undefined way.

You can also access logs of previously restarted containers by doing: 
{noformat}
kubectl -n <namespace> logs <podname> --previous{noformat}
I understand not wanting to set "Always" to the Executor pod, to allow Spark to 
control graceful termination of executors, but shouldn't we at least set it to 
"OnFailure", to allow OOMKilled executors to come back up?

As far as the driver is concerned, our client mode setup has the driver pod 
living as a deployment, which means the restart policy is Always. No reason we 
can't allow Always or OnFailure in the driver restart policy imo.

> Allow configurable restart policy of driver and executor pods
> -------------------------------------------------------------
>
>                 Key: SPARK-30055
>                 URL: https://issues.apache.org/jira/browse/SPARK-30055
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 3.1.0
>            Reporter: Kevin Hogeland
>            Priority: Major
>
> The current Kubernetes scheduler hard-codes the restart policy for all pods 
> to be "Never". To restart a failed application, all pods have to be deleted 
> and rescheduled, which is very slow and clears any caches the processes may 
> have built. Spark should allow a configurable restart policy for both drivers 
> and executors for immediate restart of crashed/killed drivers/executors as 
> long as the pods are not evicted. (This is not about eviction resilience, 
> that's described in this issue: SPARK-23980)
> Also, as far as I can tell, there's no reason the executors should be set to 
> never restart. Should that be configurable or should it just be changed to 
> Always?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to