Use case: In any environment - time to time - administrator needs to perform a maintenance. Current stop sequence of cloudstack management server will ignore the fact that there may be long running async jobs - and terminate the process. This in turn can create a poor user experience and occasional inconsistency in cloudstack db.
This is especially painful in large environments where the user has thousands of nodes and there is a continuous patching that happens around the clock - that requires migration of workload from one node to another. With that said - i've created a script that monitors the async job queue for given MS and waits for it complete all jobs. More details are posted below. I'd like to introduce "graceful-shutdown" into the systemctl/service of cloudstack-management service. The details of how it will work is below: Workflow for graceful shutdown: Using iptables/firewalld - block any connection attempts on 8080/8443 (we can identify the ports dynamically) Identify the MSID for the node, using the proper msid - query async_job table for 1) any jobs that are still running (or job_status=“0”) 2) job_dispatcher not like “pseudoJobDispatcher" 3) job_init_msid=$my_ms_id Monitor this async_job table for 60 minutes - until all async jobs for MSID are done, then proceed with shutdown If failed for any reason or terminated, catch the exit via trap command and unblock the 8080/8443 Comments are welcome Regards, ilya