[Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-22 Thread C. Bensend

Hey folks,

   I'm continuing to iron out the wrinkles with 3.5.1 and distributed
monitoring.  I'm using mod_gearman to submit and receive events from
two distributed pollers.

   Every now and again, I'll get something similar in the log on the
centralized collecting machine:

CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
youre trying to run actually exists. (worker: collector.domain.org)

   To me, that suggests that the collector system didn't get a result
for a host or service in a timely manner from one of the polling
systems, and so it attempted to run an active check itself.  However,
it doesn't seem to be able to, and I don't know why.

   The collector has the same value for $USER1$, and it has the same
set of plugins installed on it:

On the collector:

grep USER1 etc/resource.cfg
$USER1$=/usr/local/nagios/libexec

On the two pollers:

$USER1$=/usr/local/nagios/libexec
$USER1$=/usr/local/nagios/libexec

   The plugins are installed in identical locations on all three systems,
that's enforced via Puppet.  The 'nagios' user can find and run them on
the collector:

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.13

   Now, because this is a distributed setup, the collector system is
not configured to run active checks:

grep ^execute etc/nagios.cfg
execute_service_checks=0
execute_host_checks=0

   ... but *obviously* it's trying to.  Is it failing because it's
configured to not run them?  If that's the case, the error message is
not accurate and should be corrected.  If that's *not* the case, why
can't my collector server run an active check when it believes it needs
to?

   I use NConf to generate my configurations, if that matters.  There are
a *lot* of hosts/services and quite a few configuration files, so I'm not
going to paste a slew of information here.  If I'm missing pertinent
information, please let me know exactly what you want to see and I'll
get it.

   I'd really appreciate a clue-by-four.  Thanks, folks!  :)

Benny


-- 
"No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head."
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Alerting

2013-08-22 Thread Claudio Kuenzler
On Thu, Aug 22, 2013 at 1:26 AM, Charles Rice  wrote:
> you need to put in the config files of the nodes connected to the switch
> that the switch is a parent device. I do not have the syntax in front of me,
> but I think it is just
> parent

It's "parents", just for the sake of completeness.

--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null