[msmom] Cluster Maintenance EN-Mass

Andrew Kunz Thu, 04 Sep 2014 14:21:52 -0700
Some of this is based on past work from the community (Derek Harkin) 
http://derekhar.blogspot.ca/2008/02/initiate-maintenance-mode-from-agent-no.html
 
We have a script that logs an event on a computer, a rule watching it to 
generate an information alert, we watch for that alert in orchestrator, launch 
a workflow that parses the alertdescription and passes machine name and 
interval to a maintenance mode runbook, which of course puts machine into 
maintenance mode :)
 
It's not used on many machines, I have concern on how it would scale, 
discussion point.
 
I've been Playing with Jeremy's maintmode script 
http://gallery.technet.microsoft.com/scriptcenter/17eebf90-f6c7-46ba-9229-2bbde819aab2#content
 and I have been thinking a bit
 
We have probably close to 10,000 agents spread out across 4 management groups, 
Our Netcool/Omnibus Enterprise console has had some code put into it to allow 
our patching tool to send messages to it to ignore alerts from hosts for X 
amount of time (there is logic in it that monitors stateful alerts and if not 
resolved send them after patching cycle but rule alerts are dropped), it only 
knows about computers, not clusternames, resources groups ect. Nobody tells 
scom what is going on
 
We have about 600 clusters, lets just say they are not patched using best 
practices and a significant number of them do not failover gracefully, pretty 
much brute force, any alerts from the hosts themselves are masked via omnibus 
logic but cluster alerts flow through quite well (several hundred during a 
patch cycle)
 
I'm pondering having our patching tool write the event and using Jeremy's 
script within Scorch to put the clusters into MM
 
Anyone have any thoughts or experiences on this ? It is not uncommon for us to 
patch over 400 servers in a cycle, hence the scale concern
[msmom] Cluster Maintenance EN-Mass

Reply via email to