[Puppet Users] Why is it so hard to make a sane nagios server config?

Nick Moffitt Fri, 11 Mar 2011 02:35:14 -0800

I've tried to achieve my overall goals with several different features
of Puppet, but I've hit a bit of a wall here.  I think it's time for me
to explain what I'm trying to accomplish:


        I want the enabling of a service in my manifests to configure
        the monitoring of that service by a nagios server, without
        needless repetition.

Let me explain how my non-automated nagios3 server is configured:

        Each service is declared once per type (or type+role, but there
        are very few types distinguished by role, so we'll ignore that
        for now).  Services are members of hostgroups and servicegroups,
        and hosts join hostgroups in order to activate monitoring.
        Servicegroups are mostly used for sorting in the Web UI.

Now let's look at how Puppet "wants" to configure Nagios:

        Each host exports a unique service for each of the checks it
        should be subject to.

Holy cow!  So if I have 25 checks per machine, and a thousand nodes,
that's 25,000 entries!  We're talking about a nagios_services.cfg
measured in the tens of *megabytes*!  I'm a little stunned by this
pattern.

I had created a system whereby nodes exported concat::fragments
expressing desired membership in a hostgroup, and defines that created
services, hostgroups, and servicegroups only if they hadn't already been
made.  All this ground to a halt and with only 50 test nodes the
compilation of the nagios server's catalog took 3 minutes on a
reasonably spec'd machine.  I have to assume that the AST's mass of
defines-of-defines was partly to blame.

I'd like to make it as simple as possible for my module writers: "Just
use the nagios::nrpe_check define.  That'll make the nrpe config locally
and export everything needed to ensure this host is checked for your
module's behaviors!" Forcing them to make a service check *here* and a
hostgroup *there* and don't forget to add the hostgroup to the node that
used the class *there*… well it just opens up too many opportunities for
human fallibility.

So what are my options here?  I don't want to probe the hosts to
determine what should be monitored, because the whole *point* of
monitoring is to alert you when your live system state isn't what you
asked for.  I also worry that this Cartesian product of everything by
everyone will not scale acceptably into the future.

Fingers crossed that I'm just being dense and missed something big!

-- 
"Ill-informed qmail-bashing is better than no
qmail-bashing at all."
        --Don Marti

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

[Puppet Users] Why is it so hard to make a sane nagios server config?

Reply via email to