Hi, I have a spark standalone cluster with 100s of applications per day, and it changes size (more or less workers) at various hours. The driver runs on a separate machine outside the spark cluster.
When a job is running and it's worker is killed (because at that hour the number of workers is reduced), it sometimes fails, instead of redistributing the work to other workers. How is it possible to decomission a worker, so that it doesn't receive any new work, but does finish all existing work before shutting down? Thanks!