Use case:
In any environment - time to time - administrator needs to perform a
maintenance. Current stop sequence of cloudstack management server will
ignore the fact that there may be long running async jobs - and terminate
the process. This in turn can create a poor user experience and occasional
inconsistency  in cloudstack db.

This is especially painful in large environments where the user has
thousands of nodes and there is a continuous patching that happens around
the clock - that requires migration of workload from one node to another.

With that said - i've created a script that monitors the async job queue
for given MS and waits for it complete all jobs. More details are posted
below.

I'd like to introduce "graceful-shutdown" into the systemctl/service of
cloudstack-management service.

The details of how it will work is below:

Workflow for graceful shutdown:
  Using iptables/firewalld - block any connection attempts on 8080/8443 (we
can identify the ports dynamically)
  Identify the MSID for the node, using the proper msid - query async_job
table for
1) any jobs that are still running (or job_status=“0”)
2) job_dispatcher not like “pseudoJobDispatcher"
3) job_init_msid=$my_ms_id

Monitor this async_job table for 60 minutes - until all async jobs for MSID
are done, then proceed with shutdown
    If failed for any reason or terminated, catch the exit via trap command
and unblock the 8080/8443

Comments are welcome

Regards,
ilya

Reply via email to