thinkharderdev commented on issue #655: URL: https://github.com/apache/arrow-ballista/issues/655#issuecomment-1424191101
@yahoNanJing @mingmwang I prototyped something on our fork here https://github.com/coralogix/arrow-ballista/commit/9887f7757f33225769de52874ef10aa7fa6e4b57 The basic gist is: 1. Add a new executor status `Fenced` indicating executor is shutting down 2. When executor begins shutdown, immediately send a heartbeat with status `Fenced` 3. Schedulers should only consider executors with `Active` status as alive. 4. Executor still sends `executor_stopped` rpc immediately when it begin shutdown 5. But when the scheduler receives that rpc is waits a configurable amount of time (default 30s) before removing the executor If this seems sensible I can work on upstreaming it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
