Github user lins05 commented on the issue: https://github.com/apache/spark/pull/17750 > Do you then think it would be a viable option to enable it by default on Coarse grained and have it not used in Fine-grained. SGTM, especially considering fine-grained mode is already deprecated. > Could you expand on this a bit more, I assume we could maintain the state of the tasks similar to how driver state is maintained in MesosClusterScheduler, and accordingly update state at crucial points, like start. I don't think it's an easy task at all, because the spark driver is not designed to recover from crash. The state in the MesosClusterScheduler is pretty simple. It's just a REST server that accepts requests from clients and launches spark drivers on their behalf. And it just need to persist its mesos framework id, because it need to re-register with mesos master with the same framework id if it's restarted. In the current implementation MesosClusterScheduler uses zookeeper as the persist storage. Aside from that, the MesosClusterScheduler has no other stateful information. The spark driver is totally different, because it contains lots of stateful information: the job/stage/task info, executors info, catalog that holds temporary views, to name a few. And all those are kept in the driver's memory and would be lost whenever the driver crashes. So it doesn't make sense to set `failover_timeout` at all, because spark driver doesn't support fail over.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org