Re: [Puppet Users] Why is it so hard to make a sane nagios server config?

Martijn Grendelman Fri, 11 Mar 2011 03:46:17 -0800

Hi Nick,

> I've tried to achieve my overall goals with several different features
> of Puppet, but I've hit a bit of a wall here.  I think it's time for me
> to explain what I'm trying to accomplish:
> 
>       I want the enabling of a service in my manifests to configure
>       the monitoring of that service by a nagios server, without
>       needless repetition.
> 
> Let me explain how my non-automated nagios3 server is configured:
> 
>       Each service is declared once per type (or type+role, but there
>       are very few types distinguished by role, so we'll ignore that
>       for now).  Services are members of hostgroups and servicegroups,
>       and hosts join hostgroups in order to activate monitoring.
>       Servicegroups are mostly used for sorting in the Web UI.
> 
> Now let's look at how Puppet "wants" to configure Nagios:
> 
>       Each host exports a unique service for each of the checks it
>       should be subject to.
> 
> Holy cow!  So if I have 25 checks per machine, and a thousand nodes,
> that's 25,000 entries!  We're talking about a nagios_services.cfg
> measured in the tens of *megabytes*!  I'm a little stunned by this
> pattern.
> 
> I had created a system whereby nodes exported concat::fragments
> expressing desired membership in a hostgroup, and defines that created
> services, hostgroups, and servicegroups only if they hadn't already been
> made.  All this ground to a halt and with only 50 test nodes the
> compilation of the nagios server's catalog took 3 minutes on a
> reasonably spec'd machine.  I have to assume that the AST's mass of
> defines-of-defines was partly to blame.
> 
> I'd like to make it as simple as possible for my module writers: "Just
> use the nagios::nrpe_check define.  That'll make the nrpe config locally
> and export everything needed to ensure this host is checked for your
> module's behaviors!" Forcing them to make a service check *here* and a
> hostgroup *there* and don't forget to add the hostgroup to the node that
> used the class *there*… well it just opens up too many opportunities for
> human fallibility.
> 
> So what are my options here?  I don't want to probe the hosts to
> determine what should be monitored, because the whole *point* of
> monitoring is to alert you when your live system state isn't what you
> asked for.  I also worry that this Cartesian product of everything by
> everyone will not scale acceptably into the future.
> 
> Fingers crossed that I'm just being dense and missed something big!


I'm afraid not.

Most of this past week I spent building my Puppet-induced Nagios
configuration, and after exploring my options, I decided that the
hostgroup-based configuration you describe is the only sensible way.

I did exactly what you did: use exported concat-fragments to collect the
hostgroups on the puppetmaster and then use generate() to provision the
hostgroups parameter of the nagios_host.

Catalog runs on the Nagios servers typically take less than a minute to
run, but I only have about 50 hosts defined so far. I hope that this
scheme will be sustainable.

The biggest downside of my system so far, is the fact that, in a
worst-case scenario, it takes 90 minutes for a configuration change to
propagate to Nagios.

Regards,
Martijn.

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to puppet-users@googlegroups.com.
To unsubscribe from this group, send email to 
puppet-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Re: [Puppet Users] Why is it so hard to make a sane nagios server config?

Reply via email to