I'm thinking of using a configuration from "job.cancel.threshold.minutes" - it will be the longest
"category": "Advanced", "description": "Time (in minutes) for async-jobs to be forcely cancelled if it has been in process for long", "name": "job.cancel.threshold.minutes", "value": "60" On Wed, Apr 4, 2018 at 1:36 PM, Rafael Weingärtner < rafaelweingart...@gmail.com> wrote: > Big +1 for this feature; I only have a few doubts. > > * Regarding the tasks/jobs that management servers (MSs) execute; are these > tasks originate from requests that come to the MS, or is it possible that > requests received by one management server to be executed by other? I mean, > if I execute a request against MS1, will this request always be > executed/threated by MS1, or is it possible that this request is executed > by another MS (e.g. MS2)? > > * I would suggest that after we block traffic coming from 8080/8443/8250(we > will need to block this as well right?), we can log the execution of tasks. > I mean, something saying, there are XXX tasks (enumerate tasks) still being > executed, we will wait for them to finish before shutting down. > > * The timeout (60 minutes suggested) could be global settings that we can > load before executing the graceful-shutdown. > > On Wed, Apr 4, 2018 at 5:15 PM, ilya musayev <ilya.mailing.li...@gmail.com > > > wrote: > > > Use case: > > In any environment - time to time - administrator needs to perform a > > maintenance. Current stop sequence of cloudstack management server will > > ignore the fact that there may be long running async jobs - and terminate > > the process. This in turn can create a poor user experience and > occasional > > inconsistency in cloudstack db. > > > > This is especially painful in large environments where the user has > > thousands of nodes and there is a continuous patching that happens around > > the clock - that requires migration of workload from one node to another. > > > > With that said - i've created a script that monitors the async job queue > > for given MS and waits for it complete all jobs. More details are posted > > below. > > > > I'd like to introduce "graceful-shutdown" into the systemctl/service of > > cloudstack-management service. > > > > The details of how it will work is below: > > > > Workflow for graceful shutdown: > > Using iptables/firewalld - block any connection attempts on 8080/8443 > (we > > can identify the ports dynamically) > > Identify the MSID for the node, using the proper msid - query async_job > > table for > > 1) any jobs that are still running (or job_status=“0”) > > 2) job_dispatcher not like “pseudoJobDispatcher" > > 3) job_init_msid=$my_ms_id > > > > Monitor this async_job table for 60 minutes - until all async jobs for > MSID > > are done, then proceed with shutdown > > If failed for any reason or terminated, catch the exit via trap > command > > and unblock the 8080/8443 > > > > Comments are welcome > > > > Regards, > > ilya > > > > > > -- > Rafael Weingärtner >