Github user stoader commented on the issue: https://github.com/apache/spark/pull/21067 @mccheah > But whether or not the driver should be relaunchable should be determined by the application submitter, and not necessarily done all the time. Can we make this behavior configurable? This should be easy by configuring [Pod Backoff failure policy](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy) of the job such that it executes the pod only once. > We don't have a solid story for checkpointing streaming computation right now We've done work for this to store checkpointing on persistence volume but thought that should be a separate PR as it's not strictly linked to this change. > you'll certainly lose all progress from batch jobs Agree that the batch job would be rerun from scratch. Still I think there is value for one being able to run the batch job unattended and not intervene in case of machine failure as the batch job will be rescheduled to another node.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org