Andreas, On Tue, May 17, 2011 at 7:57 AM, Andreas Ericsson <a...@op5.se> wrote: >> Any plans to detatch notification attributes from service / host >> definitions in 4.x and make them their own top-level configuration >> class like escalations to make it easier to scale notification >> definitions for large projects? >> > > Not really. What would such an object look like? How would it add > additional benefit compared to using templates for hosts and services? > I think if I could just see some sort of example definition of it I'd > get an inkling of why some seem to think it's such a great idea. Right > now, I see no additional benefit to it.
It would look just like an escalation. What doesn't work well for large configurations with notification policies being stuck into host and service objects is this scenario (which is the one we are in at work by design): * Multiple configuration editors who own various parts of the Nagios configuration tree - in our case this used to be one big tree, now we have set up separate trees for separate projects - we have about 20-30 people who can edit their project-specific configurations. * A set of services that are global in nature - service -> hostgroup -> host - baseline monitoring required by all projects using standards established by multiple organizations in our company - for our example, base host monitoring with an SNMP agent (6 services across every host) - we have other global services as well and a core team who develop, maintain an augment both our distributed Nagios software and these global services and configurations * A set of services that are specific to each project using our distributed variant of Nagios - managed by subject matter experts on each team. With this scenario, how do we let each group that is responsible for hosts that have these global services on them create individually tailored notification policies since there is one notification policy per service? * We configure our base service and host to 'notify' on every state change using the command name do_nothing * We created a custom patch so that when the string 'do_nothing' is seen in the command name this state change only increments the notification count - it does not trigger any external command to run * We created a patch (partial - no serialization to disk) for escalation logic that tracks in memory when a fault escalation was sent so that OK escalations are only sent in response to something that was in a fault state. We are working on completing this patch so that across restarts the state is saved. * We have all groups use escalations to define their notification policies - the service and host notification commands then trigger our distributed pollers to send escalation requests to a network-based notification service we have that then lets the notification requests trigger email, SMS, SNMP traps, etc without having to re-configure Nagios for every notification transport /. method change. Yeah, it is very ugly, and why? Because 1 notification policy per service, that doesn't scale well when taking advantage of service -> hostgroup -> host mappings, which is a critical pattern to use when scaling a configuration. We have over 9000 hosts being monitored by our distributed framework (and growing) with around 30 configuration editors and 120+ users. Our distributed framework was centralized and a ''one project for all" but now is a cluster of distributed set ups, one distributed set up per project, which is scaling nicely. Our largest distributed installations have 3900 and 5100 hosts in them respectively - we have 4 other distributed instances that are just getting ramped up and only have a few dozen hosts apiece at this point. So while this is ugly, it works! All editors can define escalation objects that take into account both their individual needs for global service notifications as well as any project-specific notifications - and by putting project-specific hosts in project-specific host groups, for most groups, two escalation policy definitions are all that are needed per project - one for hosts, one for services. If all notifications were just done through an escalation like configuration object, life for a big project would be much easier. 1) Having notifications clearly separated as their own configuration template in the Nagios DSL makes it much less confusing for people new to Nagios to understand 'where to configure notifications' 2) The configuration flexibility of the escalation template makes it very easy to work with for a large configuration. Our global and project specific scenario and all the notification changes we made is also serving us very well as we grow. Notifications as separate objects would let us back out a number of patches and would reallly simplify our configuraiton and let our pollers run hotter . - Max ------------------------------------------------------------------------------ What Every C/C++ and Fortran developer Should Know! Read this article and learn how Intel has extended the reach of its next-generation tools to help Windows* and Linux* C/C++ and Fortran developers boost performance applications - including clusters. http://p.sf.net/sfu/intel-dev2devmay _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null