Folks - we’re getting bitten occasionally by stability issues on some of our 
customer VMs indirectly related to ACS:

* The billing package[1] we use is touchy, and will occasionally reboot VMs 
when we bring up the VM’s details page in the billing package
* ACS recently lost connectivity with a node, asked the VR to ping the VMs but 
was blocked by host firewall, so decided the VM was down and then killed it 
after reconnecting to the node
* Something was either fat-fingered or mis-intreperted in billing package, and 
deleting a licensing product from a customer resulted in it telling ACS to 
delete a domain, user, the 10 VMs in it and their storage (Luckily I saw the 
grey icon of Shutdown/Expunge and shut down mgmt server, but not before losing 
one VM. Somehow I haven’t had a heart attack yet)

My thought is each VM would have a LOCK field - when that’s set, it basically 
becomes “read-only” to ACS - stats are gathered, it monitors if it’s up/down, 
but any change in running state, the node it’s on, storage, network, firewalls 
etc would be denied without some type of authorization (I’m not sure what I 
mean here yet, if it’s a separate login or maybe authenticating to get a token 
and then present it with the change, or...).

I understand in a larger environment there’s too much happening and this could 
backfire, but for our customers with legacy non-cloud architectures, stability 
is hugely important and anything we can do to help with that is worthwhile. 
Maybe in a “phase 2” of this implementation granular controls could be added to 
specify what could/could not happen during “production lock”...

Looking to gauge interest and ideas/suggestions in something like this. 
Unfortunately it just jumped pretty much to the top of my priority list...

John
1: I’d rather not say which at this point.

Reply via email to