subject:"\[Nagios\-users\] Distributed monitoring"

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread C. Bensend


I'm continuing to iron out the wrinkles with 3.5.1 and distributed
 monitoring.  I'm using mod_gearman to submit and receive events from
 two distributed pollers.

Every now and again, I'll get something similar in the log on the
 centralized collecting machine:

 CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
 youre trying to run actually exists. (worker: collector.domain.org)

To me, that suggests that the collector system didn't get a result
 for a host or service in a timely manner from one of the polling
 systems, and so it attempted to run an active check itself.  However,
 it doesn't seem to be able to, and I don't know why.

The collector has the same value for $USER1$, and it has the same
 set of plugins installed on it:

 On the collector:

 grep USER1 etc/resource.cfg
 $USER1$=/usr/local/nagios/libexec

 On the two pollers:

 $USER1$=/usr/local/nagios/libexec
 $USER1$=/usr/local/nagios/libexec

The plugins are installed in identical locations on all three systems,
 that's enforced via Puppet.  The 'nagios' user can find and run them on
 the collector:

 /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
 NRPE v2.13

Now, because this is a distributed setup, the collector system is
 not configured to run active checks:

 grep ^execute etc/nagios.cfg
 execute_service_checks=0
 execute_host_checks=0

... but *obviously* it's trying to.  Is it failing because it's
 configured to not run them?  If that's the case, the error message is
 not accurate and should be corrected.  If that's *not* the case, why
 can't my collector server run an active check when it believes it needs
 to?

I use NConf to generate my configurations, if that matters.  There are
 a *lot* of hosts/services and quite a few configuration files, so I'm not
 going to paste a slew of information here.  If I'm missing pertinent
 information, please let me know exactly what you want to see and I'll
 get it.

No one has an idea about this?  And no, Andreas, I can't move to
4.0 yet.  ;)

Thanks!

Benny


-- 
No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head.
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread Justin Pryzby

Do you get many of those error messages in the logs at once, or just
one at a time?

Only one thought: what are the permissions on your $USER$ variables?
Nagios on my systems setuid() to nonroot after startup, and if it gets
SIGHUP to reload config, but can't read the file defining $USER*$,
will act strangely.

Justin

On Wed, Aug 28, 2013 at 06:48:09AM -0500, C. Bensend wrote:
 
 I'm continuing to iron out the wrinkles with 3.5.1 and distributed
  monitoring.  I'm using mod_gearman to submit and receive events from
  two distributed pollers.
 
 Every now and again, I'll get something similar in the log on the
  centralized collecting machine:
 
  CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
  youre trying to run actually exists. (worker: collector.domain.org)
 
 To me, that suggests that the collector system didn't get a result
  for a host or service in a timely manner from one of the polling
  systems, and so it attempted to run an active check itself.  However,
  it doesn't seem to be able to, and I don't know why.
 
 The collector has the same value for $USER1$, and it has the same
  set of plugins installed on it:
 
  On the collector:
 
  grep USER1 etc/resource.cfg
  $USER1$=/usr/local/nagios/libexec
 
  On the two pollers:
 
  $USER1$=/usr/local/nagios/libexec
  $USER1$=/usr/local/nagios/libexec
 
 The plugins are installed in identical locations on all three systems,
  that's enforced via Puppet.  The 'nagios' user can find and run them on
  the collector:
 
  /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
  NRPE v2.13
 
 Now, because this is a distributed setup, the collector system is
  not configured to run active checks:
 
  grep ^execute etc/nagios.cfg
  execute_service_checks=0
  execute_host_checks=0
 
 ... but *obviously* it's trying to.  Is it failing because it's
  configured to not run them?  If that's the case, the error message is
  not accurate and should be corrected.  If that's *not* the case, why
  can't my collector server run an active check when it believes it needs
  to?
 
 I use NConf to generate my configurations, if that matters.  There are
  a *lot* of hosts/services and quite a few configuration files, so I'm not
  going to paste a slew of information here.  If I'm missing pertinent
  information, please let me know exactly what you want to see and I'll
  get it.
 
 No one has an idea about this?  And no, Andreas, I can't move to
 4.0 yet.  ;)
 
 Thanks!
 
 Benny
 
 
 -- 
 No matter how tempted I am with the prospect of unlimited power, I
 will not consume any energy field bigger than my head.
   -- #22 on Peter Anspach's Evil
  Overlord list
 
 
 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
 

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread C. Bensend


 Do you get many of those error messages in the logs at once, or just
 one at a time?

 Only one thought: what are the permissions on your $USER$ variables?
 Nagios on my systems setuid() to nonroot after startup, and if it gets
 SIGHUP to reload config, but can't read the file defining $USER*$,
 will act strangely.

Just one at a time, seemingly randomly.  A host here, a service there,
several times a day.  They always almost immediately recover, but I
don't understand why my centralized collector seems to have this issue.

Nagios runs as the nagios user, which can read the resource.cfg file
fine:

ls -ld . ; ls -l nagios-hostname.cfg resource.cfg
drwxrwx--- 6 root nagios 4096 Aug 27 16:02 .
-rw-r--r-- 1 root root   47606 Jul  1 11:18 nagios-hostname.cfg
-rw-r- 1 root nagios  2400 Mar 19 11:25 resource.cfg

Thanks!


-- 
No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head.
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread Sven Nierlein

On 8/22/13 13:51, C. Bensend wrote:
 CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
 youre trying to run actually exists. (worker: collector.domain.org)

Hi,

if this is the collector host, why does it have a mod-gearman worker installed? 
If nagios would have
run the check by itself, there would be no hint about the worker in the error. 
So it seems like there
is a worker started on your collector host which then grabs some checks but 
isn't able to execute them.

Regards,
  Sven


-- 
Sven Nierlein sven.nierl...@consol.de
ConSol* GmbH  http://www.consol.de
Franziskanerstrasse 38Tel.:089/45841-439
81669 MuenchenFax.:089/45841-111


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread C. Bensend


 On 8/22/13 13:51, C. Bensend wrote:
 CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
 youre trying to run actually exists. (worker: collector.domain.org)

 Hi,

 if this is the collector host, why does it have a mod-gearman worker
 installed? If nagios would have
 run the check by itself, there would be no hint about the worker in the
 error. So it seems like there
 is a worker started on your collector host which then grabs some checks
 but isn't able to execute them.

Oh ho!  I have multiple *gearman* processes running:

ps axuww | grep gearman
gearmand  5662  0.7  0.1 404672  2496 ?Ssl  Aug17 118:29
/usr/sbin/gearmand -d -l /var/log/gearmand/gearmand.log
nagios5712  0.0  0.0  38024   640 ?Ss   Aug17   1:03
/usr/bin/mod_gearman_worker -d
--config=/etc/mod_gearman/mod_gearman_worker.conf
--pidfile=/var/mod_gearman/mod_gearman_worker.pid
nagios   25919  0.0  0.1 137492  3016 ?S07:38   0:00
/usr/bin/mod_gearman_worker -d
--config=/etc/mod_gearman/mod_gearman_worker.conf
--pidfile=/var/mod_gearman/mod_gearman_worker.pid

.. etc ..

Are you saying I just need gearmand running on the collector?  I'm
quite new to gearman, so I might have misunderstood which parts are
necessary where.  I can easily shut down the mod_gearman_worker
service, I just need to understand the consequences.

I assumed that this was a Nagios error - perhaps I just have my
gearman setup configured wrong.

Benny


-- 
No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head.
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread Sven Nierlein

On 8/28/13 14:43, C. Bensend wrote:
 Are you saying I just need gearmand running on the collector?

Well, i assumed it. You are the only one which really can tell that.
You will need a worker on each host which should run checks. If your
collector should not run any checks, than no worker is necessary.

See http://labs.consol.de/nagios/mod-gearman/#_common_scenarios for a list
of common setups.

  Sven

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-28 Thread C. Bensend


 On 8/28/13 14:43, C. Bensend wrote:
 Are you saying I just need gearmand running on the collector?

 Well, i assumed it. You are the only one which really can tell that.
 You will need a worker on each host which should run checks. If your
 collector should not run any checks, than no worker is necessary.

 See http://labs.consol.de/nagios/mod-gearman/#_common_scenarios for a list
 of common setups.

OK, yes, I grok that.  I guess I would want the collector to be *able*
to run checks, if it doesn't get timely information from the pollers.
I'm assuming that's why it's even trying in the first place - it
doesn't see a result in a timely manner, so it thinks it should run
one.

Which circles back to my original question - why can't it run the
check?  Why isn't it finding what it needs to find?  The workers
are running as the nagios user, and I don't see anything that appears
pertinent in the mod_gearman_worker.conf file...  What am I missing?
Neither the gearmand.log nor the mod_gearman_worker.log files seem
to have any complaints (but I haven't bumped up the debug on them yet).

Thanks so much for your help!

Benny


-- 
No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head.
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring: central collector doesn't seem to be able to run active checks

2013-08-22 Thread C. Bensend


Hey folks,

   I'm continuing to iron out the wrinkles with 3.5.1 and distributed
monitoring.  I'm using mod_gearman to submit and receive events from
two distributed pollers.

   Every now and again, I'll get something similar in the log on the
centralized collecting machine:

CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
youre trying to run actually exists. (worker: collector.domain.org)

   To me, that suggests that the collector system didn't get a result
for a host or service in a timely manner from one of the polling
systems, and so it attempted to run an active check itself.  However,
it doesn't seem to be able to, and I don't know why.

   The collector has the same value for $USER1$, and it has the same
set of plugins installed on it:

On the collector:

grep USER1 etc/resource.cfg
$USER1$=/usr/local/nagios/libexec

On the two pollers:

$USER1$=/usr/local/nagios/libexec
$USER1$=/usr/local/nagios/libexec

   The plugins are installed in identical locations on all three systems,
that's enforced via Puppet.  The 'nagios' user can find and run them on
the collector:

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.13

   Now, because this is a distributed setup, the collector system is
not configured to run active checks:

grep ^execute etc/nagios.cfg
execute_service_checks=0
execute_host_checks=0

   ... but *obviously* it's trying to.  Is it failing because it's
configured to not run them?  If that's the case, the error message is
not accurate and should be corrected.  If that's *not* the case, why
can't my collector server run an active check when it believes it needs
to?

   I use NConf to generate my configurations, if that matters.  There are
a *lot* of hosts/services and quite a few configuration files, so I'm not
going to paste a slew of information here.  If I'm missing pertinent
information, please let me know exactly what you want to see and I'll
get it.

   I'd really appreciate a clue-by-four.  Thanks, folks!  :)

Benny


-- 
No matter how tempted I am with the prospect of unlimited power, I
will not consume any energy field bigger than my head.
  -- #22 on Peter Anspach's Evil
 Overlord list


--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring: v3.4.1 not translating host states like it should

2012-10-30 Thread C. Bensend


Hey folks,

   I am in the process of implementing a distributed monitoring
architecture, and I'm having some problems with host state.  Here
are the specs:

Nagios v3.4.1
RHEL 6.3
Using NSCA to send results to passive collector

   Yes, I have 'translate_passive_host_checks' set on the collector.  :)

   So, the system is up and running, and I do see host alerts in
/var/log/messages on the collector.  However, in the web interface,
all hosts remain up.  I can go into the host details for a host
that's offline because of Sandy, and it reports a host status of
UP, with the status information PING CRITICAL - Packet loss 100%.

   Obviously, the host states coming from the passive monitors are
not being translated.

   Active host and service checks are disabled on the collector,
and enabled on the monitors.  Passive host and service checks are
enabled everywhere, and the collector *is* receiving them.

   I'd appreciate it if someone can help me out here...  I'll
provide whatever details are necessary...

Thanks much!

Benny


-- 
Unless you're a lawyer, you don't understand Oracle licensing.
That applies equally to Oracle employees as well as customers.
  -- Me, 2012-05-10



--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-04-05 Thread Gerheim

Hi

Thanks Dan!
I'm reading about check_mk with Livestatus and I think it'll help me.

--
Wallace Gerheim
--
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring

2011-03-30 Thread Gerheim

Hello folks,

I'm new on Nagios and nagios-users mailing list.
I was looking for addons for Nagios but i didn't find one wich attends me.
Let me explain my scenario. If someone could help me, i ll be grateful.

I have one nagios central with centreon. I have another nagios (worker),
with centreon, wich i want to configure some hots and hostgroups that i
don't want to configure on nagios central. But i want to monitor workers
host and hotsgroups with nagios central.

Summarizing, when i manipulate hosts on workers i don't want to put in
nagios central.
I've found NCSA, DNX and Gearman but it don't help.

Could anyone?
Thanks!

*Wallace Knopp de Menezes Gerheim*
--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-03-30 Thread Assaf Flatto

Gerheim wrote:
 Hello folks,

 I'm new on Nagios and nagios-users mailing list.
 I was looking for addons for Nagios but i didn't find one wich attends me.
 Let me explain my scenario. If someone could help me, i ll be grateful.

 I have one nagios central with centreon. I have another nagios 
 (worker), with centreon, wich i want to configure some hots and 
 hostgroups that i don't want to configure on nagios central. But i 
 want to monitor workers host and hotsgroups with nagios central.

 Summarizing, when i manipulate hosts on workers i don't want to put in 
 nagios central.
 I've found NCSA, DNX and Gearman but it don't help.

 Could anyone?
 Thanks!

 /Wallace Knopp de Menezes Gerheim/
Hello and Welcome.

For distributed nagios you want to start here
http://nagios.sourceforge.net/docs/3_0/distributed.html

The main point that you will need to know is that in order for the 
central nagios to see the hosts monitored by the worker , the definition 
of the remote monitored hosts need to also be on the central nagios ( 
defined as passive checks and hosts) .

Nagios currently has no way of detecting the child hosts of the 
worker and adding them to it's configuration , there for they have to be 
defined on both servers ( with minor changes : active/passive) .

Assaf

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-03-30 Thread Steve Wilson


On 30/03/11 14:00, Gerheim wrote:

Hello folks,

I'm new on Nagios and nagios-users mailing list.
I was looking for addons for Nagios but i didn't find one wich attends me.
Let me explain my scenario. If someone could help me, i ll be grateful.

I have one nagios central with centreon. I have another nagios 
(worker), with centreon, wich i want to configure some hots and 
hostgroups that i don't want to configure on nagios central. But i 
want to monitor workers host and hotsgroups with nagios central.


Summarizing, when i manipulate hosts on workers i don't want to put in 
nagios central.

I've found NCSA, DNX and Gearman but it don't help.

Could anyone?
Thanks!



At $dayjob I discovered the same problem, however we also had another 
problem in that a lot of the servers we want to monitor are in walled 
gardens and only have http(s) access to the internet.
Our solution was to semi-create our own uploader mechanism, we use nsca 
on the receiver side and use a custom submit_check_results that instead 
of piping the results through nsca_send it writes them into a data file 
for the time/service which gets stored for scheduled upload via cron.
This scheduled uploader keeps track of the hosts/localhost.cfg file and 
if it detects change adds that to be uploaded too. These files are then 
bzipped and uploaded via curl to a php script on the nagios server that 
knows how to save them in for use.
The nagios server then has a scheduled process that first looks to see 
if there's a new config file for hosts and moves it into etc/hosts/*.cfg 
( separate files per host ) and then reloads nagios, it then takes the 
contents of the remaining nagios data files and pipes the contents of 
them through ncsa.
Although this scenario is far from perfect and we've only been working 
on it for 2 weeks on and off it seems to suit our needs.
Maybe looking at a config collector plugin, something like the cisco 
router config one, and modifying it to store the nagios host config 
could be used to pass the config through nsca, although I'm not sure how 
you'd process that on the server to update the host config there.

/
/
--

Steve Wilson
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-03-30 Thread Daniel Wittenberg

My vote is to look at Multisite and livestatus from check_mk project.

Dan

From: Gerheim [mailto:wallacegerh...@gmail.com]
Sent: Wednesday, March 30, 2011 8:01 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring

Hello folks,

I'm new on Nagios and nagios-users mailing list.
I was looking for addons for Nagios but i didn't find one wich attends me.
Let me explain my scenario. If someone could help me, i ll be grateful.

I have one nagios central with centreon. I have another nagios (worker), with 
centreon, wich i want to configure some hots and hostgroups that i don't want 
to configure on nagios central. But i want to monitor workers host and 
hotsgroups with nagios central.

Summarizing, when i manipulate hosts on workers i don't want to put in nagios 
central.
I've found NCSA, DNX and Gearman but it don't help.

Could anyone?
Thanks!

Wallace Knopp de Menezes Gerheim

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-03-30 Thread Romain Le Merlus

Hi Wallace,

On Wed, Mar 30, 2011 at 3:00 PM, Gerheim wallacegerh...@gmail.com wrote:

 I'm new on Nagios and nagios-users mailing list.
 I was looking for addons for Nagios but i didn't find one wich attends me.
 Let me explain my scenario. If someone could help me, i ll be grateful.
 I have one nagios central with centreon. I have another nagios (worker),
 with centreon, wich i want to configure some hots and hostgroups that i
 don't want to configure on nagios central. But i want to monitor workers
 host and hotsgroups with nagios central.
 Summarizing, when i manipulate hosts on workers i don't want to put in
 nagios central.


The standard distributed configuration for Centreon is to have a central web
server (Centreon  Nagios) and pollers (Nagios).
The whole configuration is define on the central server and you associate
each host recorded on the correct Nagios engine.

Then, with the access control list feature, you can easily choose to display
or not hostgroup/host monitoring on the web side for each user connected.

Best regards.
-- 
Romain LE MERLUS
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2011-03-30 Thread Gerheim

Thanks everyone for reply.

I think i wasn't correctly on my explanation ...

My nagios central have a lot of objects (included nagios client objects). I
want to manipulate only a few on nagios client. I'm looking to decentralize
the scope of clients.
For example: i have 3 clients. Each one are responsable to monitor some
hosts. When they do that, they return the results to central. Clients have
nagios and centreon too.

Thanks

Ps.: Sorry if i wasn't clear. I'm learning English.

--
Wallace
--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring and dependencies

2011-02-16 Thread Henti Smith

Good day all.

I've taken over ownership of a very hacked and mutilated nagios
installation in house, and I'm busy building a migration plan and
designing the new nagios instance.

I have some questions which the documentation is not making apparent,
likely due to my lack of understanding nagios, not the documentation.

We have 3 physical locations which will be monitored, which will
likely increase, and I'm looking at a distributed monitoring setup as
described in the documentation here :

http://nagios.sourceforge.net/docs/3_0/distributed.html

Now the documentation mentions :

The purpose of the central server is to simply listen for service
check results from one or more distributed servers. Even though
services are occassionally actively checked from the central server,
the active checks are only performed in dire circumstances, so lets
just say that the central server only accepts passive check for now

We are also looking at using dependencies between hosts and services
across all locations, which according to the documentation and my
understanding of it, might be a problem.

Execution dependencies are used to restrict when active checks of a
service can be performed. Passive checks are not restricted by
execution dependencies

Unfortunately the check scheduling logic link is still in TODO status,
so I cannot explore further.

Is my understanding correct ?

If not, can you use distributed monitoring and host and service dependencies ?

As a final question. I'd like to be able to monitor a single host from
the different locations to be able to identify links going down using
the above configuration. Would I have to configure the host 3  times
for each nagios server or is there a different way.

Regards
Henti

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring best practices

2010-05-18 Thread Enrico Zimol

On 18 May 2010 15:21, Christoph Kluenter c...@iphh.net wrote:

 I am thinking about testing DNX ( dnx.sf.net )
 But since one can't  define which check will run on which
 node, we would have to reconfigure a lot of firewalls.
 Would dnx be worth this hassle ? Any experiences ?

I'm interested too about it.
Any suggest to completly centrilized monitoring?

Thank's

--

___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring

2010-05-06 Thread Enrico Zimol

wrong 3d before, sorry

On 6 May 2010 11:55, Enrico Zimol lomiz.m...@gmail.com wrote:
 Hi at all,
 I'm newbie on nagios and I'm writing here to ask you for suggestions
 abut how to structure my monitoring situation.
 I've to monitor linux servers for about 15/20 customers, from 1 to 5
 server for each customer.
 We aren't on vpn with customers, so this servers are all behind NAT.
 That isn't a problem because we are the administrator of the firewall
 (other linux server) so we can manage any kind of DNAT and filter
 rule.

 I read on official documentation that suggest to use NCSA addon for
 distributed monitoring, but we choose to use NRPE addon for different
 motivations like:
 -customer force us to do that
 -the number of monitored servers for each customer will never grow up
 -the services to monitor for each server are the same (raid hw/sw,
 disk usage etc)
 -we need a completly centralized monitoring structure

 For last sentence I thought to use the arguments option on NRPE (yes,
 I read the SECURITY document).
 Besides, to solve the problem of NAT with NRPE I'll do DNAT on
 firewall and the port parameter on check_nrpe plugin (is there
 problems to do that? I did little tests but I prefear a confirm)


 To manage this structure I need to organized a well-formed config file
 structure on nagios server.

 I thinked to structure it like this

 obj--|
        |--templatelinuxserversgeneral.cfg
        |
        |--customer_1_directory|-templateserver.cfg
        |                       |-server1.cfg
        |                       |-server2.cfg
        |                       |-servern.cfg
        |
        |--customer_2_directory|-templateserver.cfg
                                |-server1.cfg
                                |-servern.cfg


 Where:
 -templatelinuxserversgeneral.cfg is a very basic template for server
 -customer_1_directory in wich there is 1 file for each customer's server
 -templateserver.cfg will use templatelinuxserversgeneral and will add
 more specific common variabiles for that customer's server like the
 public IPAddress that will be the same for each customer's server.
 -servern.cfg in wich there will be some very specific server variables
 like nrpe port (read up).

 What do you think?
 How can I organize that service-server combination?


 Thank's so much

 P.S. sorry for my bad english

 --
 Enrico Zimol


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Disabling Checks

2009-11-27 Thread Gael Cheron

Hi,

I'm facing the same issue. And I'd like to find a way to turn off the
monitoring on the remote host from the web interface on the main server.
Because operators are not abble to log on the remote monitoring servers.

Does anyone know if there is an add-on abble to do this ? I think about
Centreon but I didn't test it yet.

Regards,
Gaël.


2009/11/21 Andrew Libby ali...@xforty.com


 From my knowledge, you'll either need to log in to the
 server running the remote nagios instance and disable checks
 in the configuration, or turn notifications off at the
 instance running the web interface.  Depending on your
 needs, it might seem a decent fit to simply turn off
 notifications yet allow the monitoring to continue.

 Andy


 Glynne Jones wrote:
  Hi,
 
  I'm in the process of setting up a distributed monitoring system and have
 hit on an issue where one of my operators wants to disable a check in the
 web interface but the check is actually being run on the remote distributed
 system.
 
  How can I disable the check on the remote host in this example?
 
  Thanks,
 
  Glynne
 
 
 
 
 --
  Let Crystal Reports handle the reporting - Free Crystal Reports 2008
 30-Day
  trial. Simplify your report design, integration and deployment - and
 focus on
  what you do best, core application coding. Discover what's new with
  Crystal Reports now.  http://p.sf.net/sfu/bobj-july
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
  ::: Messages without supporting info will risk being sent to /dev/null

 --

 ===
 xforty technologies
 Andrew Libby
 ali...@xforty.com
 http://xforty.com
 ===



 --
 Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
 trial. Simplify your report design, integration and deployment - and focus
 on
 what you do best, core application coding. Discover what's new with
 Crystal Reports now.  http://p.sf.net/sfu/bobj-july
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Disabling Checks

2009-11-27 Thread Romain Le Merlus

Hi Gael,

On Fri, Nov 27, 2009 at 11:05 AM, Gael Cheron gael.che...@free.fr wrote:

 I'm facing the same issue. And I'd like to find a way to turn off the
 monitoring on the remote host from the web interface on the main server.
 Because operators are not abble to log on the remote monitoring servers.
 Does anyone know if there is an add-on abble to do this ? I think about
 Centreon but I didn't test it yet.


You are right, you can do that through Centreon web interface.
The external commands are managed with ACL, so the users are allow or not to
submit them.
Then, they are sent to the Nagios core regardless Nagios is installed on the
Centreon server or a remote poller.

Best regards.
-- 
Romain LE MERLUS | Directeur des projets

rlemer...@merethis.com
Tel. +33 (0)1 49 69 97 12
Mob. +33(0)6 85 05 02 82

MERETHIS est éditeur du logiciel Centreon.
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Disabling Checks

2009-11-21 Thread Andrew Libby


From my knowledge, you'll either need to log in to the
server running the remote nagios instance and disable checks
in the configuration, or turn notifications off at the
instance running the web interface.  Depending on your
needs, it might seem a decent fit to simply turn off
notifications yet allow the monitoring to continue.

Andy


Glynne Jones wrote:
 Hi,
 
 I'm in the process of setting up a distributed monitoring system and have hit 
 on an issue where one of my operators wants to disable a check in the web 
 interface but the check is actually being run on the remote distributed 
 system.
 
 How can I disable the check on the remote host in this example?
 
 Thanks,
 
 Glynne
 
 
 
 --
 Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
 trial. Simplify your report design, integration and deployment - and focus on 
 what you do best, core application coding. Discover what's new with
 Crystal Reports now.  http://p.sf.net/sfu/bobj-july
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

-- 

===
xforty technologies
Andrew Libby
ali...@xforty.com
http://xforty.com
===


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring Disabling Checks

2009-11-20 Thread Glynne Jones

Hi,

I'm in the process of setting up a distributed monitoring system and have hit 
on an issue where one of my operators wants to disable a check in the web 
interface but the check is actually being run on the remote distributed system.

How can I disable the check on the remote host in this example?

Thanks,

Glynne



--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring question: obsess process flow

2009-11-05 Thread Jonathan Bayles

Sorry list, I treated my last post as an email and did not observe proper 
list-fu!


Marc,

Thank you for pointing me to the distributed doc and for your explanations. I 
feel I am very close. Right now if I run:

/usr/local/nagios/libexec/eventhandlers/submit_check_result Athena 'PING' OK 
'qwrweewfkljewfglkjwegjlwejglwjeglkjwegkwleg'

From the distributed server it outputs correctly in the master server. I 
understand this to mean that the remote server is correct or close enough, and 
the NSCA pipe is correct.

 However, even with obsess and oscp_command set properly in the nagios.conf 
file on the distributed server the checks don't seem to be motivated to travel 
across. Do I need to setup a differen't service class with specific options to 
motivate the data across the NSCA pipe? What is the process flow of obsess?

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring solution - some questions

2009-09-28 Thread Simone Felici

Marc Powell ha scritto in data 25/09/2009 14.14:

 
 It sounds like you're looking for Freshness Checks. It's discussed in  
 the Distributed Monitoring documentation.
 

Thank's Marc,
Meanwhile I've read better the documentation, the freshness threshold does the 
trick.

Thank's!

Simon

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring - Freshness and Latency

2009-07-07 Thread Harald Böhmecke

Hi,

I am having a problem regarding Latency.

Here my Technical information:

All O.S: Ubuntu 9.03 Server//Nagios3.0.6

Main Server DL-380 4GB RAM and Quad-Core 3.0Mhz



My distributed Nagios3 Satellites are reporting Latency, although no
CPU, Mem, Disk or other peaks are evident:

Hosts:
Check Execution Time:  0.03 sec 4.22 sec 1.009 sec
Check Latency:0.01 sec 22.67 sec 3.166 sec

Services:
Check Execution Time:  0.04 sec 0.21 sec 0.099 sec 
Check Latency: 0.01 sec 20.18 sec 2.957 sec

Now I am just starting to configure this Satellite, so I only have 29
Hosts and 153 Services.

Due to the Latency on the Satellite, the freshness checks on the Main
Nagios are triggering and this is not good, because my SNMP bandwidth
plugins are making disasters on my graphs.

Attached is an output of nagios3 -s /etc/nagios3/nagios.cfg


Tried debugging but everything looks ok... but... what do I know, rite?

Any help is greatly appreciated!!


Regards,

Harald 

Nagios 3.0.6
Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org)
Last Modified: 12-01-2008
License: GPL

Timing information on object configuration processing is listed
below.  You can use this information to see if precaching your
object configuration would be useful.

Object Config Source: Config files (uncached)

OBJECT CONFIG PROCESSING TIMES  (* = Potential for precache savings with -u 
option)
--
Read: 0.009900 sec
Resolve:  0.000208 sec  *
Recomb Contactgroups: 0.20 sec  *
Recomb Hostgroups:0.03 sec  *
Dup Services: 0.001928 sec  *
Recomb Servicegroups: 0.06 sec  *
Duplicate:0.07 sec  *
Inherit:  0.83 sec  *
Recomb Contacts:  0.01 sec  *
Sort: 0.00 sec  *
Register: 0.001701 sec
Free: 0.000185 sec
  
TOTAL:0.014044 sec  * = 0.002258 sec (16.08%) estimated savings


RETENTION DATA TIMES
--
Read and Process: 0.009276 sec
  
TOTAL:0.009276 sec


Timing information on configuration verification is listed below.

CONFIG VERIFICATION TIMES  (* = Potential for speedup with -x option)
--
Object Relationships: 0.000556 sec
Circular Paths:   0.11 sec  *
Misc: 0.000302 sec
  
TOTAL:0.000869 sec  * = 0.11 sec (1.3%) estimated savings


EVENT SCHEDULING TIMES
-
Get service info:0.000955 sec
Get host info info:  0.03 sec
Get service params:  0.23 sec
Schedule service times:  0.000986 sec
Schedule service events: 0.000206 sec
Get host params: 0.02 sec
Schedule host times: 0.000132 sec
Schedule host events:0.42 sec
 
TOTAL:   0.002349 sec


Projected scheduling information for host and service checks
is listed below.  This information assumes that you are going
to start running Nagios with your current config files.

HOST SCHEDULING INFORMATION
---
Total hosts: 29
Total scheduled hosts:   29
Host inter-check delay method:   SMART
Average host check interval: 300.00 sec
Host inter-check delay:  10.34 sec
Max host check spread:   30 min
First scheduled check:   Thu Jan  1 01:00:00 1970
Last scheduled check:Thu Jan  1 01:00:00 1970


SERVICE SCHEDULING INFORMATION
---
Total services: 153
Total scheduled services:   153
Service inter-check delay method:   SMART
Average service check interval: 271.76 sec
Inter-check delay:  1.78 sec
Interleave factor method:   SMART
Average services per host:  5.28
Service interleave factor:  6
Max service check spread:   30 min
First scheduled check:  Tue Jul  7 18:45:37 2009
Last scheduled check:   Tue Jul  7 18:49:33 2009


CHECK PROCESSING INFORMATION

Check result reaper interval:   10 sec
Max concurrent service checks:  Unlimited


PERFORMANCE SUGGESTIONS
---
I have no suggestions - things look okay.

--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have 
the opportunity to enter the BlackBerry Developer Challenge. See full prize 
details at: http://p.sf.net/sfu/blackberry___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v)

Re: [Nagios-users] Distributed Monitoring - Freshness and Latency

2009-07-07 Thread Harald Böhmecke

I found the problem. All International SNMP Bandwidth checks from
check_snmp_int.pl which are giving Nagios Satellites latencies of up to
17 seconds and a minimum of 5 seconds. This is making the whole check
queue go rocket high.

I'll just have to go back to MRTG :(

Or does anyone have a good SNMP option for bandwidth usage monitoring
with pnp4nagios compatibility?


Regards,

Harald


-Original Message-
From: Harald Böhmecke harald.boehme...@bertelsmann.de
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring - Freshness and Latency
Date: Tue, 07 Jul 2009 19:44:29 +0200

Hi,

I am having a problem regarding Latency.

Here my Technical information:

All O.S: Ubuntu 9.03 Server//Nagios3.0.6

Main Server DL-380 4GB RAM and Quad-Core 3.0Mhz



My distributed Nagios3 Satellites are reporting Latency, although no
CPU, Mem, Disk or other peaks are evident:

Hosts:
Check Execution Time:  0.03 sec 4.22 sec 1.009 sec
Check Latency:0.01 sec 22.67 sec 3.166 sec

Services:
Check Execution Time:  0.04 sec 0.21 sec 0.099 sec 
Check Latency: 0.01 sec 20.18 sec 2.957 sec

Now I am just starting to configure this Satellite, so I only have 29
Hosts and 153 Services.

Due to the Latency on the Satellite, the freshness checks on the Main
Nagios are triggering and this is not good, because my SNMP bandwidth
plugins are making disasters on my graphs.

Attached is an output of nagios3 -s /etc/nagios3/nagios.cfg


Tried debugging but everything looks ok... but... what do I know, rite?

Any help is greatly appreciated!!


Regards,

Harald 


--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have 
the opportunity to enter the BlackBerry Developer Challenge. See full prize 
details at: http://p.sf.net/sfu/blackberry
___ Nagios-users mailing list 
Nagios-users@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include 
Nagios version, plugin version (-v) and OS when reporting any issue. ::: 
Messages without supporting info will risk being sent to /dev/null
--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have 
the opportunity to enter the BlackBerry Developer Challenge. See full prize 
details at: http://p.sf.net/sfu/blackberry___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring Parents

2009-07-05 Thread Harald Böhmecke



Hi all,

I currently have 1 Master Nagios Server and 4 Nagios Satellites which do the 
hard work.

I have defined all Parents (dependencies) on the Master Server.

Do I also need to define the Parents on the Satellites? Or will the Master 
Server (the one sending out Notifications) automatically define the Unreachable 
hosts by itself?


Regards,

Harald
--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Parents

2009-07-05 Thread Steve Shipway

If you want the satellites to suppress host/service checks when hosts are 
unreachable, then yes.
Otherwise, your central Nagios master will correctly suppress notifications (as 
it knows about the dependencies, and the satellites don't do notifications)

On our system, Ive defined the dependencies on the satellites as well because I 
want to suppress checks of unreachables (as with Nagios 2.x it causes horrible 
latencies when a sector drops out).  It's a bit messy though, as it requires 
host checks to be done on both master and satellite.

Steve


From: Harald Böhmecke [mailto:harald.boehme...@bertelsmann.de]
Sent: Sunday, 5 July 2009 11:54 p.m.
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring Parents






Hi all,



I currently have 1 Master Nagios Server and 4 Nagios Satellites which do the 
hard work.



I have defined all Parents (dependencies) on the Master Server.



Do I also need to define the Parents on the Satellites? Or will the Master 
Server (the one sending out Notifications) automatically define the Unreachable 
hosts by itself?





Regards,



Harald
--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring Central Server no status changes

2009-02-25 Thread Paul Landauer

Using nagios 3.0.5
Distributed Monitoring setup
Hosts and Services show updated status information but the status of the
host or service does not change from up on the central server.  Status
on the distributed servers is reflected correctly in the web interface.

Why might this be?

thanks,
Paul


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Central Server no status changes

2009-02-25 Thread Marc Powell


On Feb 25, 2009, at 11:51 AM, Paul Landauer wrote:

 Using nagios 3.0.5
 Distributed Monitoring setup
 Hosts and Services show updated status information but the status of  
 the
 host or service does not change from up on the central server.   
 Status
 on the distributed servers is reflected correctly in the web  
 interface.

 Why might this be?

The status that the central service is receiving from the distributed  
server for each status is up or it's not receiving or processing the  
updates?

Some things that will help get a better answer are --

- information about how you've architected your distributed setup (i.e  
are you using 2+ nagios instances with NSCA transporting between them,  
implemented as documented?)
- example host and service definitions from both servers (complete  
definitions please)
- example status information from both servers for an affected service
- related nagios.log information from both servers
- the contents of your check result submission script if it's not  
exactly like the documented one.

Running nagios and/or NSCA in debug mode on the central server might  
provide additional information.

--
Marc


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Central Server no status changes

2009-02-25 Thread Marc Powell

Hi Paul,

Please always respond on list so that others now, and in the future,  
can learn from your experience and so that you can benefit from the  
experience of others on the list. More below...

On Feb 25, 2009, at 12:54 PM, Paul Landauer wrote:

 On Wed, 2009-02-25 at 12:06 -0600, Marc Powell wrote:

 I'm using 2 servers following the documentation at
 http://nagios.sourceforge.net/docs/3_0/distributed.html

Thanks.

 - example host and service definitions from both servers (complete
 definitions please)
 Definitions are the same on both servers.
 Example host definition:
 define host{
   use generic-host
   host_name   surf
   alias   Surf Control
   address ip_address_of_surf_is_here
   max_check_attempts  5
   check_command   check-host-alive
   check_interval  5
   retry_interval  1
   check_period24x7
   contact_groups  admins
   notification_interval   30
   notification_period 24x7
   notification_optionsd,u,r
   }

 Example Service Definitions (surf is a member of  
 sunrise_windows_servers):
 define service{
   use generic-service
   hostgroup_name  sunrise_windows_servers
   service_description NSClient++ Version
   check_command   check_nt!CLIENTVERSION
   }

For future reference, these are not 'complete' since you use  
templates. There's lots of important information within those  
templates that's needed when troubleshooting as well. I expect that  
the definitions are indeed different between the servers when you take  
the templates into account otherwise your central server is doing  
active checks of the services in addition to receiving the passive  
checks, overwriting their results. (I don't think this is the problem).

 - related nagios.log information from both servers
 I included excerpts that I thought applied.  If you'd like the whole
 log, let me know.
 Nagios.log for Distributed server:
 [1235575724] SERVICE ALERT: surf;Explorer;CRITICAL;HARD; 
 3;Explorer.exe:
 not running
 [1235575724] SERVICE NOTIFICATION:
 nagiosadmin;surf;Explorer;CRITICAL;notify-service-by- 
 email;Explorer.exe:
 not running

 Nagios.log for Central Server:
 [1235575777] EXTERNAL COMMAND:
 PROCESS_SERVICE_CHECK_RESULT;surf;Explorer;0;Explorer.exe: not running
 [1235575778] PASSIVE SERVICE CHECK: surf;Explorer;0;Explorer.exe: not
 running

This is interesting and useful. As you can see, on your distributed  
server, the status is 3 (CRITICAL) but by the time NSCA dumps it into  
the command pipe on the central server, that has been translated to 0  
(OK) by something in the process. This could be because nagios isn't  
passing the correct status code to your submission script, your  
submission script is not interpreting or passing it to send_nsca  
correctly or nsca on the receiving side isn't reading it correctly.

 - the contents of your check result submission script if it's not
 exactly like the documented one.
 printfcmd=/usr/bin/printf

 NscaBin=/usr/bin/send_nsca
 NscaCfg=/etc/nagios/send_nsca.cfg
 NagiosHost=I_have_the_ip_address_of_my_central_server_here

 # Fire the data off to the NSCA daemon using the send_nsca script
 $printfcmd %s\t%s\t%s\t%s\n $1 $2 $3 $4 | $NscaBin -H
 $NagiosHost -p 5
 721 -c $NscaCfg

To say whether this is correct or not I'd have to see your OCSP  
command definition. If you're using the $SERVICESTATE$ macro, then  
this is broken. send_nsca expects a numeric state code but  
$SERVICESTATE$ provides a grammatical code (OK, CRITICAL, etc).  
Normally that needs to be translated to the proper numeric by the  
submission script first but you can also use the $SERVICESTATEID$  
macro instead to get the numeric code. My bets are on this being the  
problem.

 Running nagios and/or NSCA in debug mode on the central server might
 provide additional information.
 Let me know if you still want this to be done.

Running NSCA in debug to see if it's receiving the 0 status code from  
the distributed machine would further narrow down the source of the  
problem.

--
Marc


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Central Server no status changes

2009-02-25 Thread Paul Landauer


 
 On Feb 25, 2009, at 12:54 PM, Paul Landauer wrote:
 
  On Wed, 2009-02-25 at 12:06 -0600, Marc Powell wrote:
 
  I'm using 2 servers following the documentation at
  http://nagios.sourceforge.net/docs/3_0/distributed.html
 
 Thanks.
 
  - example host and service definitions from both servers (complete
  definitions please)
  Definitions are the same on both servers.
  Example host definition:
  define host{
  use generic-host
  host_name   surf
  alias   Surf Control
  address ip_address_of_surf_is_here
  max_check_attempts  5
  check_command   check-host-alive
  check_interval  5
  retry_interval  1
  check_period24x7
  contact_groups  admins
  notification_interval   30
  notification_period 24x7
  notification_optionsd,u,r
  }
 
  Example Service Definitions (surf is a member of  
  sunrise_windows_servers):
  define service{
  use generic-service
  hostgroup_name  sunrise_windows_servers
  service_description NSClient++ Version
  check_command   check_nt!CLIENTVERSION
  }
 
 For future reference, these are not 'complete' since you use  
 templates. There's lots of important information within those  
 templates that's needed when troubleshooting as well. I expect that  
 the definitions are indeed different between the servers when you take  
 the templates into account otherwise your central server is doing  
 active checks of the services in addition to receiving the passive  
 checks, overwriting their results. (I don't think this is the problem).
 
  - related nagios.log information from both servers
  I included excerpts that I thought applied.  If you'd like the whole
  log, let me know.
  Nagios.log for Distributed server:
  [1235575724] SERVICE ALERT: surf;Explorer;CRITICAL;HARD; 
  3;Explorer.exe:
  not running
  [1235575724] SERVICE NOTIFICATION:
  nagiosadmin;surf;Explorer;CRITICAL;notify-service-by- 
  email;Explorer.exe:
  not running
 
  Nagios.log for Central Server:
  [1235575777] EXTERNAL COMMAND:
  PROCESS_SERVICE_CHECK_RESULT;surf;Explorer;0;Explorer.exe: not running
  [1235575778] PASSIVE SERVICE CHECK: surf;Explorer;0;Explorer.exe: not
  running
 
 This is interesting and useful. As you can see, on your distributed  
 server, the status is 3 (CRITICAL) but by the time NSCA dumps it into  
 the command pipe on the central server, that has been translated to 0  
 (OK) by something in the process. This could be because nagios isn't  
 passing the correct status code to your submission script, your  
 submission script is not interpreting or passing it to send_nsca  
 correctly or nsca on the receiving side isn't reading it correctly.
 
  - the contents of your check result submission script if it's not
  exactly like the documented one.
  printfcmd=/usr/bin/printf
 
  NscaBin=/usr/bin/send_nsca
  NscaCfg=/etc/nagios/send_nsca.cfg
  NagiosHost=I_have_the_ip_address_of_my_central_server_here
 
  # Fire the data off to the NSCA daemon using the send_nsca script
  $printfcmd %s\t%s\t%s\t%s\n $1 $2 $3 $4 | $NscaBin -H
  $NagiosHost -p 5
  721 -c $NscaCfg
 
 To say whether this is correct or not I'd have to see your OCSP  
 command definition. If you're using the $SERVICESTATE$ macro, then  
 this is broken. send_nsca expects a numeric state code but  
 $SERVICESTATE$ provides a grammatical code (OK, CRITICAL, etc).  
 Normally that needs to be translated to the proper numeric by the  
 submission script first but you can also use the $SERVICESTATEID$  
 macro instead to get the numeric code. My bets are on this being the  
 problem.
 
  Running nagios and/or NSCA in debug mode on the central server might
  provide additional information.
  Let me know if you still want this to be done.
 
 Running NSCA in debug to see if it's receiving the 0 status code from  
 the distributed machine would further narrow down the source of the  
 problem.
 
 --
 Marc


Marc,

You are correct sir!  I changed $SERVICESTATE$ to $SERVICESTATEID$ on
the distributed server and the central server is updating properly.  I
imagine that I'll need to use $HOSTSTATEID$ instead of $HOSTSTATE$ as
well.

paul


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct network connection

2008-12-01 Thread Nick Lunt

 -Original Message-
 From: Andreas Ericsson [mailto:[EMAIL PROTECTED]
 Sent: 29 November 2008 14:01
 To: Nick Lunt
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring without direct
 network connection

 Nick Lunt wrote:
  Hi folks

  nagios 3.0.5 on RHEL 4u6.

  We have nagios servers all over the uk and we want to get all alerts
  from each nagios server to a central nagios server at our main
 offices.

  We do not have permanent network connectivity to the remote nagios
  servers so using NSCA is not an option.

  Has anyone any idea of how to overcome this problem ?

 Queue the events that were unsendable and send them when it becomes
 possible. Merlin is designed to handle frequently failing links with
 sometimes extremely long downtimes (it already does this), but it's
 not really production level stable yet, so I wouldn't recommend using
 it for this (unless you're interested in completing it yourself or
 sponsoring me or op5 to do it for you, ofcourse).

 More about merlin at http://git.op5.org/git/nagios/merlin.git

 pnsca, another module available there, can probably be trivially
 rewritten to stash alerts and whatnot with very good performance.

  I am thinking of
  getting the remote nagios servers to send email alerts to an account
 on
  the central nagios server then trying to get an alert generated
based
 on
  the contents of the email, has anyone tried this before ?

  Or does anyone have any better ideas for solving this problem ?

 That depends on what your end-goal is, really. Do you want only one
 server
 to send notifications, or do you want your central server to be able
to
 generate reports from the data sent in from the slave systems?

 If only one server should send notifications, I'd recommend using a
 solution
 with lower latency that gathering everything and shipping it as an
 email.
 One-way UDP communication would be one solution here, I guess, but it
 does
 require the network to be physically present at all times (and there's
 no
 failure detection what so ever, as UDP is a fire-and-forget protocol).
 Merlin would help in this case (although it can't send over UDP yet).

 If it's for reporting reasons, you'd be better off sending the
logfiles
 as
 emails when they're being rotated and then merging them together on
the
 master server. That means you can't get *accurate* reports more often
 than
 the logs are rotated, but since you'll need to sort-merge them
anyways,
 that's still going to be a problem.
 Neither merlin nor NSCA can help here, I'm afraid, as entries in the
 logs
 would get completely jumbled unless you sort-merge them before taking
 generating reports from them.

Thanks for the detailed info Andreas. I still think the nagios event -
email - nagios server is the only realistic solution. It's not perfect
as mail servers can fail and mail can get delayed but it's the best we
can do at the moment.

Cheers,
Nick. 

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-12-01 Thread Michael Gargiullo

-Original Message-
From: Nick Lunt [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 01, 2008 6:01 AM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed monitoring without direct
networkconnection

 -Original Message-
 From: Andreas Ericsson [mailto:[EMAIL PROTECTED]
 Sent: 29 November 2008 14:01
 To: Nick Lunt
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring without direct
 network connection

 Nick Lunt wrote:
  Hi folks

  nagios 3.0.5 on RHEL 4u6.

  We have nagios servers all over the uk and we want to get all alerts
  from each nagios server to a central nagios server at our main
 offices.

  We do not have permanent network connectivity to the remote nagios
  servers so using NSCA is not an option.

  Has anyone any idea of how to overcome this problem ?

 Queue the events that were unsendable and send them when it becomes
 possible. Merlin is designed to handle frequently failing links with
 sometimes extremely long downtimes (it already does this), but it's
 not really production level stable yet, so I wouldn't recommend using
 it for this (unless you're interested in completing it yourself or
 sponsoring me or op5 to do it for you, ofcourse).

 More about merlin at http://git.op5.org/git/nagios/merlin.git

 pnsca, another module available there, can probably be trivially
 rewritten to stash alerts and whatnot with very good performance.

  I am thinking of
  getting the remote nagios servers to send email alerts to an account
 on
  the central nagios server then trying to get an alert generated
based
 on
  the contents of the email, has anyone tried this before ?

  Or does anyone have any better ideas for solving this problem ?

 That depends on what your end-goal is, really. Do you want only one
 server
 to send notifications, or do you want your central server to be able
to
 generate reports from the data sent in from the slave systems?

 If only one server should send notifications, I'd recommend using a
 solution
 with lower latency that gathering everything and shipping it as an
 email.
 One-way UDP communication would be one solution here, I guess, but it
 does
 require the network to be physically present at all times (and there's
 no
 failure detection what so ever, as UDP is a fire-and-forget protocol).
 Merlin would help in this case (although it can't send over UDP yet).

 If it's for reporting reasons, you'd be better off sending the
logfiles
 as
 emails when they're being rotated and then merging them together on
the
 master server. That means you can't get *accurate* reports more often
 than
 the logs are rotated, but since you'll need to sort-merge them
anyways,
 that's still going to be a problem.
 Neither merlin nor NSCA can help here, I'm afraid, as entries in the
 logs
 would get completely jumbled unless you sort-merge them before taking
 generating reports from them.

Thanks for the detailed info Andreas. I still think the nagios event -
email - nagios server is the only realistic solution. It's not perfect
as mail servers can fail and mail can get delayed but it's the best we
can do at the moment.

Cheers,
Nick. 

-

Email isn't horrible, but it's not optimal.  We wrote a script that
checks a pop account for email every morning to ensure the daily reports
arrived as expected.

-Mike

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-11-29 Thread Hugo van der Kooij

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Nick Lunt wrote:

 It's not that the connections will be up/down it's more that they simply
 won't be there. Most of our clients are NHS (hospitals) and we  have to
 have a secure vpn connection that we dial into on an as needed basis.
 
 We currently just send alerts as emails to our support account, but the
 company is getting bigger and bigger so monitoring the support inbox is
 becoming a massive chore. We really want a central nagios server with
 the web frontend on a big flat screen on the wall :)

Setup dedicated links. With VPN's this should not cost you an arm and a leg.

You try to fix the wrong problem in my view.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
Q: Are you sure?
A: Because it reverses the logical flow of conversation.
Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

Nid wyf yn y swyddfa ar hyn o bryd. Anfonwch unrhyw waith i'w gyfieithu.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFJMRJPBvzDRVjxmYERAnwBAJ9rcMw8B6wlRMPJ3aDVdFxRKwBmEgCgqG6d
2NOt2MFHKBW8p8iwJeftv3s=
=BX9Z
-END PGP SIGNATURE-

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-11-29 Thread Andreas Ericsson

Hugo van der Kooij wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Nick Lunt wrote:
 
 It's not that the connections will be up/down it's more that they simply
 won't be there. Most of our clients are NHS (hospitals) and we  have to
 have a secure vpn connection that we dial into on an as needed basis.

 We currently just send alerts as emails to our support account, but the
 company is getting bigger and bigger so monitoring the support inbox is
 becoming a massive chore. We really want a central nagios server with
 the web frontend on a big flat screen on the wall :)
 
 Setup dedicated links. With VPN's this should not cost you an arm and a leg.
 
 You try to fix the wrong problem in my view.
 

Since the targeted organizations are hospitals, I think it's legal matters
rather than competence that make dedicated links a showstopper.

-- 
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct network connection

2008-11-29 Thread Andreas Ericsson

Nick Lunt wrote:
 Hi folks
 
 nagios 3.0.5 on RHEL 4u6.
 
 We have nagios servers all over the uk and we want to get all alerts
 from each nagios server to a central nagios server at our main offices.
 
 We do not have permanent network connectivity to the remote nagios
 servers so using NSCA is not an option.
 
 Has anyone any idea of how to overcome this problem ?

Queue the events that were unsendable and send them when it becomes
possible. Merlin is designed to handle frequently failing links with
sometimes extremely long downtimes (it already does this), but it's
not really production level stable yet, so I wouldn't recommend using
it for this (unless you're interested in completing it yourself or
sponsoring me or op5 to do it for you, ofcourse).

More about merlin at http://git.op5.org/git/nagios/merlin.git

pnsca, another module available there, can probably be trivially
rewritten to stash alerts and whatnot with very good performance.

 I am thinking of
 getting the remote nagios servers to send email alerts to an account on
 the central nagios server then trying to get an alert generated based on
 the contents of the email, has anyone tried this before ?
 
 Or does anyone have any better ideas for solving this problem ?
 

That depends on what your end-goal is, really. Do you want only one server
to send notifications, or do you want your central server to be able to
generate reports from the data sent in from the slave systems?

If only one server should send notifications, I'd recommend using a solution
with lower latency that gathering everything and shipping it as an email.
One-way UDP communication would be one solution here, I guess, but it does
require the network to be physically present at all times (and there's no
failure detection what so ever, as UDP is a fire-and-forget protocol).
Merlin would help in this case (although it can't send over UDP yet).

If it's for reporting reasons, you'd be better off sending the logfiles as
emails when they're being rotated and then merging them together on the
master server. That means you can't get *accurate* reports more often than
the logs are rotated, but since you'll need to sort-merge them anyways,
that's still going to be a problem.
Neither merlin nor NSCA can help here, I'm afraid, as entries in the logs
would get completely jumbled unless you sort-merge them before taking
generating reports from them.

-- 
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring without direct network connection

2008-11-28 Thread Nick Lunt

Hi folks

 

nagios 3.0.5 on RHEL 4u6.

 

We have nagios servers all over the uk and we want to get all alerts
from each nagios server to a central nagios server at our main offices.

We do not have permanent network connectivity to the remote nagios
servers so using NSCA is not an option.

 

Has anyone any idea of how to overcome this problem ? I am thinking of
getting the remote nagios servers to send email alerts to an account on
the central nagios server then trying to get an alert generated based on
the contents of the email, has anyone tried this before ?

 

Or does anyone have any better ideas for solving this problem ?

 

Kind Regards

Nick Lunt

Managed Services and O/S Analyst

Patech Solutions Limited

Tel: 01543 444 707

Fax: 01543 444 709

Tame House, Fradley Park, Lichfield, Staffordshire, WS13 8RZ

www.patech-solutions.com http://www.patech-solutions.com/home.htm 

 

 

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct network connection

2008-11-28 Thread Assaf Flatto


Try looking at opsview , it is doing what you want  with nagios as a component 
in his setup.

http://www.opsview.org/



On Friday 28 November 2008 11:53:28 Nick Lunt wrote:
 Hi folks



 nagios 3.0.5 on RHEL 4u6.



 We have nagios servers all over the uk and we want to get all alerts
 from each nagios server to a central nagios server at our main offices.

 We do not have permanent network connectivity to the remote nagios
 servers so using NSCA is not an option.



 Has anyone any idea of how to overcome this problem ? I am thinking of
 getting the remote nagios servers to send email alerts to an account on
 the central nagios server then trying to get an alert generated based on
 the contents of the email, has anyone tried this before ?



 Or does anyone have any better ideas for solving this problem ?



 Kind Regards

 Nick Lunt

 Managed Services and O/S Analyst

 Patech Solutions Limited

 Tel: 01543 444 707

 Fax: 01543 444 709

 Tame House, Fradley Park, Lichfield, Staffordshire, WS13 8RZ

 www.patech-solutions.com http://www.patech-solutions.com/home.htm



-- 
Assaf Flatto
SSP Ops Team
Linux System Administrator
169 Euston Road, London, NW1 2AE





IMPORTANT . this email and the information in it may be confidential, legally
privileged and/or protected by law. It is intended solely for the use of the
person to whom it is addressed. If you are not the intended recipient, please
notify the sender immediately and do not disclose the contents to any other
person, use it for any purpose, or store or copy the information in any medium.
Please also delete all copies of this email and any attachments from your
system.

We cannot guarantee the security or confidentiality of email communications. We
do not accept any liability for losses or damages that you may suffer as a
result of your receipt of this email including but not limited to computer
service or system failure, access delays or interruption, data non-delivery or
mis-delivery, computer viruses or other harmful components.

Copyright in this email and any attachments belong to Select Service Partner UK
Limited. Should you communicate with anyone at Select Service Partner UK 
Limited by
email, you consent to us monitoring and reading any such correspondence.

Nothing in this email shall be taken or read as suggesting, proposing or
relating to any agreement concerted practice or other practice that could
infringe UK or EC competition legislation.

Select Service Partner UK Limited is a company registered in England and Wales
(company number 05687183) whose registered office is at 1 The Heights, 
Brooklands, Weybridge. Surrey. KT13 0NY
 
 

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct network connection

2008-11-28 Thread Ton Voon

Thanks for the plug, but Opsview is not really suitable for this. We  
have distributed monitoring out-of-the-box, but it requires the  
permanent connection between slaves and the master which Nick says he  
hasn't got.


If you will have temporary connections (that go up and down), we have  
been thinking whether we can do batched results from slaves, though  
this is quite a big job as it requires changes to a lot of components  
(Nagios, NSCA, NDOutils, and our datawarehouse). Let us know if you  
fancy sponsoring this work.


Otherwise, why can't you just have notifications from the other  
servers? Do you need to correlate with your central server?


Ton

On 28 Nov 2008, at 14:20, Assaf Flatto wrote:



Try looking at opsview , it is doing what you want  with nagios as a  
component in his setup.


http://www.opsview.org/



On Friday 28 November 2008 11:53:28 Nick Lunt wrote:

Hi folks



nagios 3.0.5 on RHEL 4u6.



We have nagios servers all over the uk and we want to get all alerts
from each nagios server to a central nagios server at our main  
offices.


We do not have permanent network connectivity to the remote nagios
servers so using NSCA is not an option.



Has anyone any idea of how to overcome this problem ? I am thinking  
of
getting the remote nagios servers to send email alerts to an  
account on
the central nagios server then trying to get an alert generated  
based on

the contents of the email, has anyone tried this before ?



Or does anyone have any better ideas for solving this problem ?

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-11-28 Thread Nick Lunt

Hi Ton

 

thanks for the info.

 

It's not that the connections will be up/down it's more that they simply
won't be there. Most of our clients are NHS (hospitals) and we  have to
have a secure vpn connection that we dial into on an as needed basis.

We currently just send alerts as emails to our support account, but the
company is getting bigger and bigger so monitoring the support inbox is
becoming a massive chore. We really want a central nagios server with
the web frontend on a big flat screen on the wall :)

 

Im setting up a filter in postfix on the central nagios server so that
all emails coming in will go thru the filter, the filter will run a
script to call send_nsca.

 

So I'll have nagios clients - nagios  - send_email - primary nagios
server - postfix - mail filter - send_nsca - nagios

If I get this working I'll treat myself to curry :)

 

 

 

From: Ton Voon [mailto:[EMAIL PROTECTED] 
Sent: 28 November 2008 15:35
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed monitoring without direct
networkconnection

 

Thanks for the plug, but Opsview is not really suitable for this. We
have distributed monitoring out-of-the-box, but it requires the
permanent connection between slaves and the master which Nick says he
hasn't got.

 

If you will have temporary connections (that go up and down), we have
been thinking whether we can do batched results from slaves, though
this is quite a big job as it requires changes to a lot of components
(Nagios, NSCA, NDOutils, and our datawarehouse). Let us know if you
fancy sponsoring this work.

 

Otherwise, why can't you just have notifications from the other servers?
Do you need to correlate with your central server?

 

Ton

 

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-11-28 Thread Ton Voon



On 28 Nov 2008, at 16:00, Nick Lunt wrote:



So I'll have nagios clients - nagios  - send_email - primary  
nagios server - postfix - mail filter - send_nsca - nagios

If I get this working I'll treat myself to curry :)


You deserve a vindaloo.

So basically, you are using email from the disparate nagios systems as  
a delivery mechanism to send state change data to the master. Can't  
see why that wouldn't work (its just like getting trap or log event  
data).


Using emails provide you with resilience and retries, though high  
latency. Better hope there's not a network/email outage!


Ton

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring without direct networkconnection

2008-11-28 Thread Brian A. Seklecki



If I get this working I'll treat myself to curry :)



Try a coconut milk + pineapple curry.  Serve with ginger salad.  Little 
closer to heaven.  ~BAS-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] distributed monitoring host checking question

2008-07-30 Thread Tom Ammon

Hi,

I am working on setting up a distributed monitoring system with Nagios 
(actually Groundwork). I have 3 child servers and 1 parent server, using 
NSCA to send passive check results from the children to the parent server.

My question is about how Nagios (version 2.5) will behave when an on 
demand host check needs to be run.

So for example:

Host A is configured with check_host_alive ( a simple ping ) as its host 
check command on the parent server. It is also configured with Service 
A, say an SNMP check. Active host checks are not disabled on the parent 
server, but active service checks are.

Host A, obviously, is also configured on the child server. When the 
child server sends a passive check result up to the parent saying that 
the SNMP check has failed, will the parent server then run the on-demand 
host check command to verify that Host A is still up? If not, how do I 
get that information up to the parent? Are passive host checks my only 
option?

So I guess the question is this: In a distributed monitoring setup, will 
a parent server run an on-demand host check for a host that gets a 
report (via a passive service check sent from a child server) of a 
service being critical?

Thanks,

Tom

-- 
-
Tom Ammon
Network Engineer

Business Card at http://tomsbox.net/bizcard_TomAmmon.jpg

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring with distributed server running Windows

2008-07-25 Thread Moby

Hello all, I have Nagios up and running and have made myself familiar 
with nrpe and nsca.  I have read and think I understand the Nagios 
distributed monitoring setup quite clearly.
The issue I have now is that I need to setup a distributed monitoring 
setup where the distributed server will be running Windows.  In short, 
I have a Linux server as the central server, I have a distributed 
monitoring server running Windows.  I can monitor the distributed server 
itself quite well using nsca with nsclient++.  I can also monitor it 
successfully using nsca with nsclient++.  I have some more windows 
machines that I need to monitor but they cannot contact the central 
Nagios server directly, they have to come via the distributed server.  
Does anyone know of any tools that can enable my distributed server 
running Windows to act either as an nsca server to receive data from 
leaf nodes via nsca or as an nrpe server to get data from leaf nodes 
via nrpe and then send those alerts onto my central Linux box via nsca?

Thanks in advance for any help,

-- 
--Moby

They that can give up essential liberty to obtain a little temporary safety 
deserve neither liberty nor safety.  -- Benjamin Franklin



-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] distributed monitoring - slave server not that intelligent

2008-02-15 Thread Andreas Ericsson


mark redding wrote:

Hi all,

I currently have Nagios 2.10 installed on a couple of machines, one of
which is configured as a master and the other as a slave.

I have a script running on the slave which rsync's up the configs from
the master and performs health checks of the master to see that it is
running (and if it is not then it enables service checks/notifications
on the slave until such time as it detects that the master is back up
and running). I also use nsca to pass passive checks to the slave to
ensure that it has up to date information about services. The slave
does not perform any active service checks, nor are notifications
enabled unless the master is down.

I do however still have one problem and that is that the slave has no
way of knowing when we're ack'ed a critical, scheduled downtime,
disabled/enabled notfications/event handlers/checks for a service/host
on the master. What this means is that if we schedule downtime on a
host, then the master goes down, the slave starts bitching about the
host that is down (because it does not know that it's in downtime). A
similar problem occurs if we disable an event handler on the master,
because unless the slave also knows to disable the event handler it
will fire it (regardless of whether or not it is active) as soon as
the passive check result returns a critical.

At present I am getting round this by tailing the nagios log file
through a perl script that looks for specific 'EXTERNAL COMMAND'
entries and then flushes those through to the slave by ssh'ing to the
slave and writing the command string to the nagios pipe file on the
slave.

Is there a better way of doing this ?



You might get lucky using the attached NEB-module. It's not well
documented, and it's not very well tested. It will do what you're
after though. Contact me off-list if you run into problems. I've
been looking for someone to test this for quite some time now, so
I'll be happy to help.

It's written to make the two servers loadbalanced, so the slave
and the master will help each other out doing checks and then
transmit them to one another. External commands are also copied
from one to the other, so scheduled/cancelled downtime etc will
instantly show up on both servers as soon as its parsed in one.

If you don't want the host/service check syncing you'll have to
either get clever with the config or manually hack that out of
the module.

Like I said; Feel free to contact me off-list if you're having
any problems with it.

--
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231


mrm-0.1.tar.gz
Description: GNU Zip compressed data
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] distributed monitoring - slave server not that intelligent

2008-02-14 Thread mark redding

Hi all,

I currently have Nagios 2.10 installed on a couple of machines, one of
which is configured as a master and the other as a slave.

I have a script running on the slave which rsync's up the configs from
the master and performs health checks of the master to see that it is
running (and if it is not then it enables service checks/notifications
on the slave until such time as it detects that the master is back up
and running). I also use nsca to pass passive checks to the slave to
ensure that it has up to date information about services. The slave
does not perform any active service checks, nor are notifications
enabled unless the master is down.

I do however still have one problem and that is that the slave has no
way of knowing when we're ack'ed a critical, scheduled downtime,
disabled/enabled notfications/event handlers/checks for a service/host
on the master. What this means is that if we schedule downtime on a
host, then the master goes down, the slave starts bitching about the
host that is down (because it does not know that it's in downtime). A
similar problem occurs if we disable an event handler on the master,
because unless the slave also knows to disable the event handler it
will fire it (regardless of whether or not it is active) as soon as
the passive check result returns a critical.

At present I am getting round this by tailing the nagios log file
through a perl script that looks for specific 'EXTERNAL COMMAND'
entries and then flushes those through to the slave by ssh'ing to the
slave and writing the command string to the nagios pipe file on the
slave.

Is there a better way of doing this ?

-- 
bright blessings,
Mark

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring with nrpe_nt and freshness

2007-11-08 Thread Jeff Shumard - DefenseWeb Technologies

We have our monitoring configured and everything is working great
checking all our windows servers through a single windows server running
nrpe_nt.  The problem we are having is when one of our Linux Nagios
servers goes down and doesn't send any results to the master Nagios
server.  When this happens and our 5 minute freshness hits it's
threshold.  We start running active checks because we didn't receive any
passive updates from the server that went down.  This sends a bunch of
checks to the windows server to run tests and we start getting unknown
status reports back to the master server with the result of No output
available from command.  Does anyone know if there is a max connection
on nrpe_nt or something else that maybe causing this?

Thank you,
Jeff

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring with nrpe_nt and freshness

2007-11-05 Thread Jeff Shumard - DefenseWeb Technologies

Everything works fine checking the hosts if I force an active check for
all services on a host.  We are not doing host checks at all on your
servers just service checks.  The only time I have a problem is when the
freshness threshold is reached and it tries to force a check on a lot of
services at once.  It is almost like nrpe_nt is only able to process a
set amount of checks at one time.  There is no resource issue at the
time this is happening on the Nagios server and on the Windows server
running the checks.

Has anyone else had this problem?

Thank you,
Jeff


-Original Message-
From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED] 
Sent: Sunday, November 04, 2007 1:36 PM
To: Jeff Shumard - DefenseWeb Technologies
Cc: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed monitoring with nrpe_nt and
freshness

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/11/07 01:11 PM, Jeff Shumard - DefenseWeb Technologies wrote:
 We have our monitoring configured and everything is working great
 checking all our windows servers through a single windows server
running
 nrpe_nt.  The problem we are having is when one of our Linux Nagios
 servers goes down and doesn't send any results to the master Nagios
 server.  When this happens and our 5 minute freshness hits it's
 threshold.  We start running active checks because we didn't receive
any
 passive updates from the server that went down.  This sends a bunch of
 checks to the windows server to run tests and we start getting unknown
 status reports back to the master server with the result of No output
 available from command.  Does anyone know if there is a max
connection
 on nrpe_nt or something else that maybe causing this?

While I can't answer your question, I can suggest using check_dummy to
set an UNKNOWN status to hosts not monitored. Is especially make sense
if some of the hosts can't be monitored directly from the central
server.

Also are you sure the central server is allowed to talk to your nrpe_nt
(IP access list)?

Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLjsw6dZ+Kt5BchYRAq6pAKDHXC7fjtgFNNTQUnJXrDXJxMDKAQCfftsa
OTu41Chzk37uyYHRCU3x+eM=
=VZZn
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring with nrpe_nt and freshness

2007-11-04 Thread Thomas Guyot-Sionnest

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/11/07 01:11 PM, Jeff Shumard - DefenseWeb Technologies wrote:
 We have our monitoring configured and everything is working great
 checking all our windows servers through a single windows server running
 nrpe_nt.  The problem we are having is when one of our Linux Nagios
 servers goes down and doesn't send any results to the master Nagios
 server.  When this happens and our 5 minute freshness hits it's
 threshold.  We start running active checks because we didn't receive any
 passive updates from the server that went down.  This sends a bunch of
 checks to the windows server to run tests and we start getting unknown
 status reports back to the master server with the result of No output
 available from command.  Does anyone know if there is a max connection
 on nrpe_nt or something else that maybe causing this?

While I can't answer your question, I can suggest using check_dummy to
set an UNKNOWN status to hosts not monitored. Is especially make sense
if some of the hosts can't be monitored directly from the central server.

Also are you sure the central server is allowed to talk to your nrpe_nt
(IP access list)?

Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLjsw6dZ+Kt5BchYRAq6pAKDHXC7fjtgFNNTQUnJXrDXJxMDKAQCfftsa
OTu41Chzk37uyYHRCU3x+eM=
=VZZn
-END PGP SIGNATURE-

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed monitoring with nrpe_nt and freshness

2007-11-02 Thread Jeff Shumard - DefenseWeb Technologies

We have our monitoring configured and everything is working great
checking all our windows servers through a single windows server running
nrpe_nt.  The problem we are having is when one of our Linux Nagios
servers goes down and doesn't send any results to the master Nagios
server.  When this happens and our 5 minute freshness hits it's
threshold.  We start running active checks because we didn't receive any
passive updates from the server that went down.  This sends a bunch of
checks to the windows server to run tests and we start getting unknown
status reports back to the master server with the result of No output
available from command.  Does anyone know if there is a max connection
on nrpe_nt or something else that maybe causing this?

Thank you,
Jeff

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring Freshness checking failing then recovering

2007-10-16 Thread Ivan Fetch

Hi Sean,

On Mon, 15 Oct 2007, Sean McAvoy wrote:

 On further investigations it looks as though the problem is with the
 time taken to submit the results back to nagios via send_nsca.
 I have read about a couple different options for getting results back
 quickly. One being a bulk system of transfer, a file containing the
 results is sent via a send_nsca bulk transfer executed via cron. The
 other being a system that makes use of the performance data output
 option on the remote nagios systems and submits the results using a
 custom daemon on both ends.
 Does anybody know of any other options? Also, is there any guides to
 setting up either of these options, most of what I have read is email
 threads..
 Thanks.

 On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:

 Hello,
 I have 1 central nagios system with 5 distributed servers. I have
 enabled freshness checking on both central and remote systems. I am
 constantly seeing services go to unknown status for 1-3 minutes and
 then recover.
 on the remotes I have:
 check_service_freshness=1
 service_freshness_check_interval=10
 check_host_freshness=1
 host_freshness_check_interval=60
 service_inter_check_delay_method=s
 max_service_check_spread=10
 service_interleave_factor=1
 host_inter_check_delay_method=s
 max_host_check_spread=30
 max_concurrent_checks=0

 It does appear as though checks are being run in parallel. I'm wonder
 how I can best determine where the problem is, with the execution of
 checks, submittal to the central system or other.
 Thanks.


 _sean

 --
 ---
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a
 browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

 Sean McAvoy
 NOC Acting Team Lead
 Afilias Canada

 P. 416.673.4194




 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null



This may be the caching possibility you have already mentioned, but 
here is a blog posting about caching send_nsca:

http://altinity.blogs.com/dotorg/2006/11/caching_nsca_da.html


This is in the back of my mind for us down the road, but I have not 
looked into it personally, just seen the post.  I have just started 
looking at what Opsview has to offer.


Thanks,

Ivan.


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering

2007-10-16 Thread Jonathan Call

Sean;

I have a very large deployment so I use this tool:

http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon

This daemon runs on each of the distributed servers while a normal ncsa
daemon listens on the central server.
 
Jonathan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Sean McAvoy
 Sent: Monday, October 15, 2007 12:09 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring Freshness
 checkingfailing then recovering
 
 On further investigations it looks as though the problem is with the
 time taken to submit the results back to nagios via send_nsca.
 I have read about a couple different options for getting results back
 quickly. One being a bulk system of transfer, a file containing the
 results is sent via a send_nsca bulk transfer executed via cron. The
 other being a system that makes use of the performance data output
 option on the remote nagios systems and submits the results using a
 custom daemon on both ends.
 Does anybody know of any other options? Also, is there any guides to
 setting up either of these options, most of what I have read is email
 threads..
 Thanks.
 
 On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:
 
  Hello,
  I have 1 central nagios system with 5 distributed servers. I have
  enabled freshness checking on both central and remote systems. I am
  constantly seeing services go to unknown status for 1-3 minutes and
  then recover.
  on the remotes I have:
  check_service_freshness=1
  service_freshness_check_interval=10
  check_host_freshness=1
  host_freshness_check_interval=60
  service_inter_check_delay_method=s
  max_service_check_spread=10
  service_interleave_factor=1
  host_inter_check_delay_method=s
  max_host_check_spread=30
  max_concurrent_checks=0
 
  It does appear as though checks are being run in parallel. I'm
wonder
  how I can best determine where the problem is, with the execution of
  checks, submittal to the central system or other.
  Thanks.
 
 
  _sean
 
 
--
  ---
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a
  browser.
  Download your FREE copy of Splunk now  http://get.splunk.com/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to
/dev/null
 
 Sean McAvoy
 NOC Acting Team Lead
 Afilias Canada
 
 P. 416.673.4194
 
 
 
 


-
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a
browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring Freshness checkingfailing then recovering

2007-10-16 Thread Live Great

Hi Jonathan,

Why not use check_by_ssh instead? 
Is there any pitfall (weakness) in using check_by_ssh compared agent like OCP?

Thanks
Sam

- Original Message 
From: Jonathan Call [EMAIL PROTECTED]
To: Sean McAvoy [EMAIL PROTECTED]; nagios-users@lists.sourceforge.net
Sent: Wednesday, October 17, 2007 7:19:46 AM
Subject: Re: [Nagios-users] Distributed monitoring Freshness checkingfailing 
then recovering

Sean;

I have a very large deployment so I use this tool:

http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon

This daemon runs on each of the distributed servers while a normal ncsa
daemon listens on the central server.
 
Jonathan

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Sean McAvoy
 Sent: Monday, October 15, 2007 12:09 PM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Distributed monitoring Freshness
 checkingfailing then recovering
 
 On further investigations it looks as though the problem is with the
 time taken to submit the results back to nagios via send_nsca.
 I have read about a couple different options for getting results back
 quickly. One being a bulk system of transfer, a file containing the
 results is sent via a send_nsca bulk transfer executed via cron. The
 other being a system that makes use of the performance data output
 option on the remote nagios systems and submits the results using a
 custom daemon on both ends.
 Does anybody know of any other options? Also, is there any guides to
 setting up either of these options, most of what I have read is email
 threads..
 Thanks.
 
 On 12-Oct-07, at 12:40 PM, Sean McAvoy wrote:
 
  Hello,
  I have 1 central nagios system with 5 distributed servers. I have
  enabled freshness checking on both central and remote systems. I am
  constantly seeing services go to unknown status for 1-3 minutes and
  then recover.
  on the remotes I have:
  check_service_freshness=1
  service_freshness_check_interval=10
  check_host_freshness=1
  host_freshness_check_interval=60
  service_inter_check_delay_method=s
  max_service_check_spread=10
  service_interleave_factor=1
  host_inter_check_delay_method=s
  max_host_check_spread=30
  max_concurrent_checks=0
 
  It does appear as though checks are being run in parallel. I'm
wonder
  how I can best determine where the problem is, with the execution of
  checks, submittal to the central system or other.
  Thanks.
 
 
  _sean
 
 
--
  ---
  This SF.net email is sponsored by: Splunk Inc.
  Still grepping through log files to find problems?  Stop.
  Now Search log events and configuration files using AJAX and a
  browser.
  Download your FREE copy of Splunk now  http://get.splunk.com/
  ___
  Nagios-users mailing list
  Nagios-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/nagios-users
  ::: Please include Nagios version, plugin version (-v) and OS when
  reporting any issue.
  ::: Messages without supporting info will risk being sent to
/dev/null
 
 Sean McAvoy
 NOC Acting Team Lead
 Afilias Canada
 
 P. 416.673.4194
 
 
 
 


-
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a
browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null




-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users

[Nagios-users] Distributed monitoring Freshness checking failing then recovering

2007-10-12 Thread Sean McAvoy

Hello,
I have 1 central nagios system with 5 distributed servers. I have  
enabled freshness checking on both central and remote systems. I am  
constantly seeing services go to unknown status for 1-3 minutes and  
then recover.
on the remotes I have:
check_service_freshness=1
service_freshness_check_interval=10
check_host_freshness=1
host_freshness_check_interval=60
service_inter_check_delay_method=s
max_service_check_spread=10
service_interleave_factor=1
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0

It does appear as though checks are being run in parallel. I'm wonder  
how I can best determine where the problem is, with the execution of  
checks, submittal to the central system or other.
Thanks.


_sean

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Web Interface Issue

2007-05-09 Thread Simon Marcil

Hi Marco,

 

I will set this up.

 

Thanks a lot!

 

Simon

 

From: Marco Supino [mailto:[EMAIL PROTECTED] 
Sent: May-09-07 1:43 AM
To: Simon Marcil; nagios-users@lists.sourceforge.net
Subject: RE: [Nagios-users] Distributed Monitoring Web Interface Issue

 

Hi,

 

I have the same scenario, and what I did was to enable active checks on
all services, but put check_period to none, so a check is never
executed, except if freshness checking runs it.

 

Marco.

 

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon
Marcil
Sent: Wednesday, May 09, 2007 02:59
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring Web Interface Issue

 

I have a distributed monitoring setup. I have several servers reporting
back to a central server. The central server also does a couple checks
but most of it's hosts and services are disabled (because it receives
the info from other servers).

 

The problem I have is with the web interface. In the Tactical Overview
all the problems reported from distributed servers show up as
disabled. This means that we can't have a correct listing of Unhandled
Problems. For example, Let's say I have 3 hosts down coming from a
distributed server with 1 that has been acknowledged. I will have the
following:

 

3 Down

1 Acknowledged

3 Disabled 

 

In this example, is there a way to only list the host which are down and
not acknowledged???

 

If this wasn't clear let me know and I will clearify.

 

Simon

 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Web Interface Issue

2007-05-09 Thread Jeff Shumard - DefenseWeb Technologies

Marco,

 

I have the same issue with all my services showing up as disabled
because I have active checks turned off on my Centralized Nagios
Interface.  I and running Nagios 2.9 and I configured what you said but
that didn't fix the problem it just caused a couple of others.  Here is
what I did bellow.

 

1)   I didn't configure the service to have active_checks on and had
no check_period configured.  This did resolve the issue of the service
saying disabled because the active check was turned on.  This caused
another problem.  The active checks were being done after 30 seconds way
before my freshness_threshold of 600 seconds and my
normal_check_interval of 3 minutes.  It shouldn't have checked it at
all.

 

2)   I tried it another way of creating a check_period called none
which had not times configured to check.  I made the service use this as
its check_period.  When I did this it then never ran an active check
even though I had a freshness_threshold configured.

 

Is there something I did wrong, or are you running an older version of
Nagios then 2.9?  If anyone else has found a resolution to this problem
I would appreciate your comments.

 

Thank you,

Jeff



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marco
Supino
Sent: Tuesday, May 08, 2007 10:43 PM
To: Simon Marcil; nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue

 

Hi,

 

I have the same scenario, and what I did was to enable active checks on
all services, but put check_period to none, so a check is never
executed, except if freshness checking runs it.

 

Marco.

 

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon
Marcil
Sent: Wednesday, May 09, 2007 02:59
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring Web Interface Issue

 

I have a distributed monitoring setup. I have several servers reporting
back to a central server. The central server also does a couple checks
but most of it's hosts and services are disabled (because it receives
the info from other servers).

 

The problem I have is with the web interface. In the Tactical Overview
all the problems reported from distributed servers show up as
disabled. This means that we can't have a correct listing of Unhandled
Problems. For example, Let's say I have 3 hosts down coming from a
distributed server with 1 that has been acknowledged. I will have the
following:

 

3 Down

1 Acknowledged

3 Disabled 

 

In this example, is there a way to only list the host which are down and
not acknowledged???

 

If this wasn't clear let me know and I will clearify.

 

Simon

 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Web Interface Issue

2007-05-09 Thread Marco Supino

Hi,

 

You are right, I also modified a source file, allowing freshness to run
even in check_period=none, this is the patch, if a service has check
freshness, it will run it.

 

Marco.

 

 

[EMAIL PROTECTED]:~$ diff -Naur /tmp/new/nagios-2.8/base/checks.c
/tmp/nagios-2.8/base/checks.c

--- /tmp/new/nagios-2.8/base/checks.c   2007-03-01 14:15:10.0
-0500

+++ /tmp/nagios-2.8/base/checks.c   2007-03-13 04:10:46.0
-0400

@@ -1732,8 +1732,8 @@

if(temp_service-is_being_freshened==TRUE)

continue;

 

-   /* see if the time is right... */

-
if(check_time_against_period(current_time,temp_service-check_period)==E
RROR)

+   /* see if the time is right... but we're using
auto-freshness threshold */

+
if(check_time_against_period(current_time,temp_service-check_period)==E
RROR  temp_service-check_freshness==FALSE)

continue;

 

/* EXCEPTION */

@@ -1741,6 +1741,7 @@

if(temp_service-check_interval==0 
temp_service-freshness_threshold==0)

continue;

 

+

 #ifdef TEST_FRESHNESS

printf(CHECKFRESHNESS 3\n);

 #endif

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Shumard - DefenseWeb Technologies
Sent: Wednesday, May 09, 2007 20:58
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue

 

Marco,

 

I have the same issue with all my services showing up as disabled
because I have active checks turned off on my Centralized Nagios
Interface.  I and running Nagios 2.9 and I configured what you said but
that didn't fix the problem it just caused a couple of others.  Here is
what I did bellow.

 

1)   I didn't configure the service to have active_checks on and had
no check_period configured.  This did resolve the issue of the service
saying disabled because the active check was turned on.  This caused
another problem.  The active checks were being done after 30 seconds way
before my freshness_threshold of 600 seconds and my
normal_check_interval of 3 minutes.  It shouldn't have checked it at
all.

 

2)   I tried it another way of creating a check_period called none
which had not times configured to check.  I made the service use this as
its check_period.  When I did this it then never ran an active check
even though I had a freshness_threshold configured.

 

Is there something I did wrong, or are you running an older version of
Nagios then 2.9?  If anyone else has found a resolution to this problem
I would appreciate your comments.

 

Thank you,

Jeff



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marco
Supino
Sent: Tuesday, May 08, 2007 10:43 PM
To: Simon Marcil; nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Distributed Monitoring Web Interface Issue

 

Hi,

 

I have the same scenario, and what I did was to enable active checks on
all services, but put check_period to none, so a check is never
executed, except if freshness checking runs it.

 

Marco.

 

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon
Marcil
Sent: Wednesday, May 09, 2007 02:59
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring Web Interface Issue

 

I have a distributed monitoring setup. I have several servers reporting
back to a central server. The central server also does a couple checks
but most of it's hosts and services are disabled (because it receives
the info from other servers).

 

The problem I have is with the web interface. In the Tactical Overview
all the problems reported from distributed servers show up as
disabled. This means that we can't have a correct listing of Unhandled
Problems. For example, Let's say I have 3 hosts down coming from a
distributed server with 1 that has been acknowledged. I will have the
following:

 

3 Down

1 Acknowledged

3 Disabled 

 

In this example, is there a way to only list the host which are down and
not acknowledged???

 

If this wasn't clear let me know and I will clearify.

 

Simon

 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring Web Interface Issue

2007-05-08 Thread Simon Marcil

I have a distributed monitoring setup. I have several servers reporting
back to a central server. The central server also does a couple checks
but most of it's hosts and services are disabled (because it receives
the info from other servers).

 

The problem I have is with the web interface. In the Tactical Overview
all the problems reported from distributed servers show up as
disabled. This means that we can't have a correct listing of Unhandled
Problems. For example, Let's say I have 3 hosts down coming from a
distributed server with 1 that has been acknowledged. I will have the
following:

 

3 Down

1 Acknowledged

3 Disabled 

 

In this example, is there a way to only list the host which are down and
not acknowledged???

 

If this wasn't clear let me know and I will clearify.

 

Simon

 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring Web Interface Issue

2007-05-08 Thread Marco Supino

Hi,

 

I have the same scenario, and what I did was to enable active checks on
all services, but put check_period to none, so a check is never
executed, except if freshness checking runs it.

 

Marco.

 

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon
Marcil
Sent: Wednesday, May 09, 2007 02:59
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring Web Interface Issue

 

I have a distributed monitoring setup. I have several servers reporting
back to a central server. The central server also does a couple checks
but most of it's hosts and services are disabled (because it receives
the info from other servers).

 

The problem I have is with the web interface. In the Tactical Overview
all the problems reported from distributed servers show up as
disabled. This means that we can't have a correct listing of Unhandled
Problems. For example, Let's say I have 3 hosts down coming from a
distributed server with 1 that has been acknowledged. I will have the
following:

 

3 Down

1 Acknowledged

3 Disabled 

 

In this example, is there a way to only list the host which are down and
not acknowledged???

 

If this wasn't clear let me know and I will clearify.

 

Simon

 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring : Monitoring server sending alert .

2007-01-25 Thread Saulo Silva


Hi Folks ,

I have a distribute nagios configuration running well . Except that the
monitoring server start to send notification either the configuration is set
to no in this monitoring . ( enable_notifications=0 )

I using Nagios 1.2 with SUSE 9 .
Another questions. Is it any way to do host passive monitoring in nagios 1.x?
I would not like to use active monitoring in my central server .

best regards,

Saulo Silva
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring

2007-01-22 Thread Moayad Mohammad

Dears,

I have one nagios server working in my company, and I
need to add another nagios server to monitor another servers in other
subnets,

I don't know if there's any solution to have 2 nagios servers(1 central
nagios) and 1 monitor screen... it's mean the second server will send
all check results to the central nagios.

 

 

Thank you in advance

Moayad Mohammad

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring

2007-01-22 Thread John Longland

Yes you can.
 
Do a google for distributed monitoring.
 
Basically it caomes down to having the one nagios configured as a slave.
It then passes all its info to the main Nagios. The services that are
monitored
on the slave nagios, are configured as passive on the main Nagios., but
the
data is still displayed with all the other active check-data on the same
screen.
 
One thing that I did:
Because the data from the slave gets passed on NOT on request but passively,
if the slave nagios dies, the main will not know about it. So my
slave-nagios is actively checked
with a ping by the main-nagios.
 
John

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Moayad
Mohammad
Sent: 22 January 2007 11:29
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring



Dears,

I have one nagios server working in my company, and I need
to add another nagios server to monitor another servers in other subnets,

I don't know if there's any solution to have 2 nagios servers(1 central
nagios) and 1 monitor screen... it's mean the second server will send all
check results to the central nagios.

 

 

Thank you in advance

Moayad Mohammad

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring - Redundancy

2006-06-25 Thread Steve Shipway

  I'm running Nagios is a distributed environment which is working very 
  well. I would like to add a little redundancy to the 
 picture now that I have everything working. ;-)
...
  It seems that a secondary cold spare might be the best solution.
  Then there are maintenance issues with keeping software up to data, 
  etc.
 No - look at linux HA (heartbeat) and drbd.
  So many problems, so little beer.
 The linux HA/drdb setup is well understood and quite easy.

We use linux-HA here to have a redundant setup of two servers.  In fact, we
are running our Nagios on one and our MRTG on the other, and they both
provide failover for each other.  They both pass between each other a set of
virtual IPs, services, disks and filesystems.  Works very well, and is very
reliable.  I uses the v1.x linux HA (trather than the newer feature-rich
v2.x) as we only have a 2-machine failover cluster and simplicity makes
things easier.

We have an external SCSI disk pack connected to two adaptec serveRAID cards
(these helpfully have locking capabilities for just this setup).  There are
two LUNs on the external pack passed between the servers.

Heartbeat goes via serial cable, crossover network cable, and the main
network.  

For people who are really paranoid, I also have a little linux-ha plugin
which uses a tiny raw partition on the disk to effect an additional lock
before mounting the filesystem.

In a failover situation, we lose only about 30 seconds and everything is
fine.  Nagios (since it uses text files) is very stable - however, I also
run mysql on the Nagios server to hold archives and summarised logs, and
this passes back and forth with no difficulty as well.

If anyone would like detailed instructions, please contact me directly.

Steve



Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring - Redundancy

2006-06-24 Thread Greg Cope

On Fri, 2006-06-23 at 09:51 -0700, Mike Koponick wrote:
 Hello Everyone,
 
 I’m running Nagios is a distributed environment which is working very
 well. I would like to add a little redundancy to the picture now that
 I have everything working. ;-)
 
 Since I’m running a distributed environment, how can I add a secondary
 “Central-Server” to the picture? I’m not worried about the sensors or
 remote Nagios servers, just the central portion of the network. The
 problems that I see are as follows:
 
 The remote servers send data via NSCA to the central server. Would
 they also have to send a second connection to the secondary server?

One way of doing it.

 NDO now sends data to my MySQL server, will the secondary server also
 need to send data? This opens a can of worms in terms of duplicate
 data, etc.

You'd need to replicate this as well.

 It seems that a secondary ”cold spare” might be the best solution.
 Then there are maintenance issues with keeping software up to data,
 etc.

No - look at linux HA (heartbeat) and drbd.

 So many problems, so little beer.

The linux HA/drdb setup is well understood and quite easy.

Greg

 Thanks in advance,
 
 Mike
 
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___ Nagios-users mailing list 
 Nagios-users@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include 
 Nagios version, plugin version (-v) and OS when reporting any issue. ::: 
 Messages without supporting info will risk being sent to /dev/null


Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring - Redundancy

2006-06-23 Thread Mike Koponick

Title: Distributed Monitoring - Redundancy

Hello Everyone,

Im running Nagios is a distributed environment which is working very well. I would like to add a little redundancy to the picture now that I have everything working. ;-)

Since Im running a distributed environment, how can I add a secondary Central-Server to the picture? Im not worried about the sensors or remote Nagios servers, just the central portion of the network. The problems that I see are as follows:

The remote servers send data via NSCA to the central server. Would they also have to send a second connection to the secondary server?

NDO now sends data to my MySQL server, will the secondary server also need to send data? This opens a can of worms in terms of duplicate data, etc.

It seems that a secondary cold spare might be the best solution. Then there are maintenance issues with keeping software up to data, etc.

So many problems, so little beer.

Thanks in advance,

Mike

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed Monitoring - Redundancy

2006-06-23 Thread Marc Powell

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Mike Koponick
 Sent: Friday, June 23, 2006 11:52 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] Distributed Monitoring - Redundancy

 Hello Everyone,

 I'm running Nagios is a distributed environment which is working very
 well. I would like to add a little redundancy to the picture now that
I
 have everything working. ;-)

 The remote servers send data via NSCA to the central server. Would
they
 also have to send a second connection to the secondary server?

Yup. Easy enough to add additional calls to send_nsca in
submit_check_result ala --
/bin/echo -e $1\t$2\t$return_code\t$4\n |
/usr/local/nagios/bin/send_nsca host1 -p 5668 -c
/usr/local/nagios/etc/send_nsca.cfg
/bin/echo -e $1\t$2\t$return_code\t$4\n |
/usr/local/nagios/bin/send_nsca host2 -p 5668 -c
/usr/local/nagios/etc/send_nsca.cfg
/bin/echo -e $1\t$2\t$return_code\t$4\n |
/usr/local/nagios/bin/send_nsca host2 -p 5669 -c
/usr/local/nagios/etc/send_nsca.cfg

(yes, I send results to 3 different Nagios installations)

 NDO now sends data to my MySQL server, will the secondary server also
need
 to send data? This opens a can of worms in terms of duplicate data,
etc.

I don't use NDO yet but I can imagine that you would experience
duplication of data unless you had a different DB for your secondary
host and reconciled them some other way.

--
Marc

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Distributed Monitoring

2006-04-12 Thread InnovationsTech, Matthew Thomas









Ive setup a distributed monitoring server. One
issue Im seeing is that the distributed server only updates the central
server every 4-6 minutes. 

I have service checks running every 90 seconds on the
distributed server. I have it set to obsess over services.



Is there any way to adjust how often the send_nsca utility
is actually ran, or adjust how often the distributed server updates the central
server?

I have freshness turned on, and it always wants to go out
and get the results, because it thinks they are stale after 2-3 min. (threshold
set to 450sec). But this creates double traffic, and kind of defeats the reason
for distributed monitoring.



Thank You,

Matt

RE: [Nagios-users] Distributed Monitoring

2006-04-12 Thread Morris, Patrick




Resultswill be getting sent toyour ocsp 
commandevery time a check result comes back on the distributed server if 
it's obsessing. Are you sure the checks are running every 90 
seconds? Or, have you set a long command_check_interval in 
nagios.cfg?




From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
InnovationsTech, Matthew ThomasSent: Wednesday, April 12, 2006 
5:06 PMTo: nagios-users@lists.sourceforge.netSubject: 
[Nagios-users] Distributed Monitoring


Ive setup a distributed 
monitoring server. One issue Im seeing is that the distributed server only 
updates the central server every 4-6 minutes. 
I have service checks running 
every 90 seconds on the distributed server. I have it set to obsess over 
services.

Is there any way to adjust how 
often the send_nsca utility is actually ran, or adjust how often the distributed 
server updates the central server?
I have freshness turned on, and it 
always wants to go out and get the results, because it thinks they are stale 
after 2-3 min. (threshold set to 450sec). But this creates double traffic, and 
kind of defeats the reason for distributed 
monitoring.

Thank 
You,
Matt

Re: [Nagios-users] Distributed Monitoring

2006-04-12 Thread Jason Martin

On Wed, Apr 12, 2006 at 08:06:08PM -0400, InnovationsTech, Matthew Thomas wrote:
 ran, or adjust how often the distributed server updates the central
 server?
If you are obsessing over services, then send_nsca is called for
each and every service check.
 I have freshness turned on, and it always wants to go out and get the
 results, because it thinks they are stale after 2-3 min. (threshold set
 to 450sec). But this creates double traffic, and kind of defeats the
 reason for distributed monitoring.
It sounds like send_nsca is not actually succeeding in getting
the data to the central server.
-Jason Martin

  
 
 Thank You,
 
 Matt
 

-- 
All stressed out, and no one to choke...
This message is PGP/MIME signed.


pgpLuPCZLeNtu.pgp
Description: PGP signature

RE: [Nagios-users] Distributed Monitoring

2006-04-12 Thread InnovationsTech, Matthew Thomas









Below is snippets from
configuration. Is there a way to debug send_nsca ? I tried snoop and the port,
tcpdump and the port, or tail on the nagios.log file and I dont see when
its submitting the results. But according to the website, they are
updating every 4-6 minutes.



Nagios.cfg

command_check_interval=-1

interval_length=30

log_external_commands=1

log_passive_checks=1





services.cfg


check_period 24x7


max_check_attempts 2 


normal_check_interval 3 90 Seconds per check


retry_check_interval 1  30 second till retry on soft fail







Thanks for the assistance.









From: Morris, Patrick
[mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 12, 2006
20:29
To: InnovationsTech, Matthew
Thomas; nagios-users@lists.sourceforge.net
Subject: RE: [Nagios-users]
Distributed Monitoring





Resultswill be getting sent
toyour ocsp commandevery time a check result comes back on the
distributed server if it's obsessing. Are you sure the checks are running
every 90 seconds? Or, have you set a long command_check_interval in
nagios.cfg?












From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of InnovationsTech, Matthew Thomas
Sent: Wednesday, April 12, 2006
5:06 PM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users]
Distributed Monitoring

Ive setup a distributed monitoring server. One
issue Im seeing is that the distributed server only updates the central
server every 4-6 minutes. 

I have service checks running every 90 seconds on the
distributed server. I have it set to obsess over services.



Is there any way to adjust how often the send_nsca utility
is actually ran, or adjust how often the distributed server updates the central
server?

I have freshness turned on, and it always wants to go out
and get the results, because it thinks they are stale after 2-3 min. (threshold
set to 450sec). But this creates double traffic, and kind of defeats the reason
for distributed monitoring.



Thank You,

Matt

[Nagios-users] Distributed monitoring problem

2005-12-21 Thread Rob Hassing

Hello all,

I'm trying to setup a distributed  monitoring system.
At the start all looked fine too me, but now I'm having some problems on
not receiving all passive checks from other hosts.

The machine is a Intel(R) Xeon(TM) CPU 2.40GHz system with 512 MB RAM.

The load is minimal. The only strange thing I can see is the memory settings:
nagios:/etc/nagios # cat /proc/meminfo
MemTotal:   514264 kB
MemFree: 30192 kB
Buffers: 44568 kB
Cached: 328004 kB
SwapCached:  8 kB
Active: 264908 kB
Inactive:   137824 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   514264 kB
LowFree: 30192 kB
SwapTotal: 1028120 kB
SwapFree:  1028020 kB
Dirty: 780 kB
Writeback:   0 kB
Mapped:  46188 kB
Slab:75556 kB
Committed_AS:   100992 kB
PageTables:   1104 kB
VmallocTotal:   507896 kB
VmallocUsed:  7264 kB
VmallocChunk:   499760 kB
HugePages_Total: 0
HugePages_Free:  0
Hugepagesize: 4096 kB

The process info tells me this:
Time Frame  Checks Completed
= 1 minute:51 (16.6%)
= 5 minutes:   221 (71.8%)
= 15 minutes:  255 (82.8%)
= 1 hour:  260 (84.4%)
Since program start:261 (84.7%)


So it's receiving less then 85% of all checks :(
There will be more passive checks to be send to this nagios server.
Do we need other hardware ?
Where do I need to look to solve this problem ?

The machines sending the passive check info are not too busy doing this,
the checks are seperated over three different servers.

One example...
This is /var/log/nagios/nagios.log:
[1135162484] EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;cat29-w11-backup;PING;0;PING OK - Packet loss
= 0%, RTA = 0.89 ms[1135162491] SERVICE ALERT:
cat29-w11-backup;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.89 ms
[1135162491] SERVICE NOTIFICATION:
nagios;cat29-w11-backup;PING;OK;notify-by-epager;PING OK - Packet loss =
0%, RTA = 0.89 ms[1135162491] SERVICE NOTIFICATION:
nagios;cat29-w11-backup;PING;OK;notify-by-email;PING OK - Packet loss =
0%, RTA = 0.89 ms
[1135162941] Warning: The results of service 'PING' on host
'cat29-w11-backup' are stale by 32 seconds (threshold=425 seconds).  I'm
forcing an immediate check of the service.
[1135162951] SERVICE ALERT:
cat29-w11-backup;PING;CRITICAL;SOFT;1;CRITICAL: Service results are stale!

It looks like its stale again too fast ?

Can somebody please help me :)


Best regards,
Rob Hassing


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37alloc_id865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Distributed monitoring problem

2005-12-21 Thread Marcel Mitsuto Fucatu Sugano

On Wed, 2005-12-21 at 12:08 +0100, Rob Hassing wrote:
 Hello all,

Hi Rob,

 I'm trying to setup a distributed  monitoring system.
 At the start all looked fine too me, but now I'm having some problems on
 not receiving all passive checks from other hosts.

Distributed monitoring is waaay cool. :) The only thing that could lead
to a issue is that CGIs that come with web-interface don't scale very
well. Here we ended up with a MySQL storing status with NEB-module. We
are now testing GroundWork's framework. It appears to fit our needs.
Only the config files generator we developed in-house, to properly setup
all distributed agents, storing all config on a database.

 The machine is a Intel(R) Xeon(TM) CPU 2.40GHz system with 512 MB RAM.

 The process info tells me this:
 Time FrameChecks Completed
 = 1 minute:  51 (16.6%)
 = 5 minutes: 221 (71.8%)
 = 15 minutes:255 (82.8%)
 = 1 hour:260 (84.4%)
 Since program start:  261 (84.7%)

Here is what we have:
= 1 minute:2383 (21.3%)
= 5 minutes:6138 (54.7%)
= 15 minutes:8321 (74.2%)
= 1 hour:10138 (90.4%)
Since program start:  10711 (95.5%)

 So it's receiving less then 85% of all checks :(
 There will be more passive checks to be send to this nagios server.
 Do we need other hardware ?
 Where do I need to look to solve this problem ?

To avoid staled services, you need to setup freshness_threshold properly
for your services. Here is your hint, setting up freshness_threshold is
something a little strange as we need to wait for the packet to arrive
with the check result, and the less services you have it configured,
letting Nagios calculates it, the better. But it is the only thing to
configure to avoid staling services results. We decided to make staled
results to appear in an Unknown status, because this could be only some
traffic issue along the packet way caused by backup/restore routines,
high traffic load, among other things that could cause such staling.

 The machines sending the passive check info are not too busy doing this,
 the checks are seperated over three different servers.

Here we have 11 distributed servers, sending check results via send_nsca
and they have around 2k services configured at each one. All sparc
servers sending to a SuSE9.3 box on commoditie hardware. This linux
machine has 2GRAM, and some SATA disks. It is a P4-HT.

 One example...
 This is /var/log/nagios/nagios.log:
 [1135162484] EXTERNAL COMMAND:
 PROCESS_SERVICE_CHECK_RESULT;cat29-w11-backup;PING;0;PING OK - Packet loss
 = 0%, RTA = 0.89 ms[1135162491] SERVICE ALERT:
 cat29-w11-backup;PING;OK;HARD;3;PING OK - Packet loss = 0%, RTA = 0.89 ms
 [1135162491] SERVICE NOTIFICATION:
 nagios;cat29-w11-backup;PING;OK;notify-by-epager;PING OK - Packet loss =
 0%, RTA = 0.89 ms[1135162491] SERVICE NOTIFICATION:
 nagios;cat29-w11-backup;PING;OK;notify-by-email;PING OK - Packet loss =
 0%, RTA = 0.89 ms
 [1135162941] Warning: The results of service 'PING' on host
 'cat29-w11-backup' are stale by 32 seconds (threshold=425 seconds).  I'm
 forcing an immediate check of the service.
 [1135162951] SERVICE ALERT:
 cat29-w11-backup;PING;CRITICAL;SOFT;1;CRITICAL: Service results are stale!
 
 It looks like its stale again too fast ?

Well, those last two lines don't indicate two staled services. The first
line which tells you the freshness_threshold indicates that Central
Nagios waited for 425 seconds and the result of the Active check arrived
32 seconds later. The last line, is indicating the Active Check being
processed by Central Nagios. Then it appears as a critical alert on
web-interface. The active check stale_service.sh or whatever line you
place there is processed. (it can be the real check, thus Central Nagios
will be actively checking on staled results, but this will cause some
load troubles :) 


HTH  Regards,
-- 
Marcel Mitsuto Fucatu Sugano [EMAIL PROTECTED]
Universo Online S.A. -- http://www.uol.com.br


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

74 matches

Mail list logo