Hey folks,
I'm continuing to iron out the wrinkles with 3.5.1 and distributed
monitoring. I'm using mod_gearman to submit and receive events from
two distributed pollers.
Every now and again, I'll get something similar in the log on the
centralized collecting machine:
CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
youre trying to run actually exists. (worker: collector.domain.org)
To me, that suggests that the collector system didn't get a result
for a host or service in a timely manner from one of the polling
systems, and so it attempted to run an active check itself. However,
it doesn't seem to be able to, and I don't know why.
The collector has the same value for $USER1$, and it has the same
set of plugins installed on it:
On the collector:
grep USER1 etc/resource.cfg
$USER1$=/usr/local/nagios/libexec
On the two pollers:
$USER1$=/usr/local/nagios/libexec
$USER1$=/usr/local/nagios/libexec
The plugins are installed in identical locations on all three systems,
that's enforced via Puppet. The 'nagios' user can find and run them on
the collector:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.13
Now, because this is a distributed setup, the collector system is
not configured to run active checks:
grep ^execute etc/nagios.cfg
execute_service_checks=0
execute_host_checks=0
... but *obviously* it's trying to. Is it failing because it's
configured to not run them? If that's the case, the error message is
not accurate and should be corrected. If that's *not* the case, why
can't my collector server run an active check when it believes it needs
to?
I use NConf to generate my configurations, if that matters. There are
a *lot* of hosts/services and quite a few configuration files, so I'm not
going to paste a slew of information here. If I'm missing pertinent
information, please let me know exactly what you want to see and I'll
get it.
I'd really appreciate a clue-by-four. Thanks, folks! :)
Benny
--
"No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head."
-- #22 on Peter Anspach's Evil
Overlord list
--
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null