Re: [Nagios-users] Passive monitoring is running slow?

2007-05-02 Thread Jonathan Call


 -Original Message-
 From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 01, 2007 4:29 PM
 To: Jonathan Call
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Passive monitoring is running slow?
 
 On 01/05/07 05:15 PM, Jonathan Call wrote:
  I have set up a distributed monitoring system per the Nagios
 documentation.
 
  I initially tested it out by having the distributed server monitor
only
 24 or so services on about 8 hosts. There didn't seem to be any
problems.
 
  I then cranked it up to 427 services on 81 hosts. I'm watching the
 distributed server right now and there is hardly any system load but
the
 Service Check Latency seems extremely high:
 
  Metric  Min.Max.Average
  Check Execution Time:   0.05 sec1.67 sec0.701
sec
  Check Latency:  60.40 sec   287.36 sec  184.514
sec
  Percent State Change:   0.00%   0.00%   0.00%
 
  This is resulting in 50% or less of the service checks completing in
the
 5 minutes or less timeframe.
 
  The Central server has had no significant change in performance at
all
 and seems to be receiving and processing everything without
difficulty.
 
  The nsca server on the central server is running with the following
 arguments:
  /usr/local/sbin/nsca --daemon -c /usr/local/etc/nsca.cfg
 
  The submit_check_result script on the distributed server is right
out of
 the documentation.
 
 There are many ways to do that; my favorite (obviously since I wrote
it
 :) ) is using the host and service performance data files as named
 pipes, and having a daemon reaping them and batch-sending data to
 send_nsca..
 
 The howto is here (and I'll be more than happy to answer your
questions
 or get your feedback):
 
 http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon
 
 It will require Libevent and the Perl module Event::Lib.
 
 Thomas

So this is a know design failure in Nagios then? I'm fairly new to
Nagios and I am completely dumbfounded at this. If you can't service
even a quarter (and probably even a tenth) of the amount of hosts and
services on a distributed server than you can on a regular active server
then what is the point of having a distributed model at all?

I will take a look at your batch sending method.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive monitoring is running slow?

2007-05-02 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Jonathan Call
 Sent: Wednesday, May 02, 2007 10:07 AM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] Passive monitoring is running slow?
 
 
 
  -Original Message-
  From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, May 01, 2007 4:29 PM
  To: Jonathan Call
  Cc: nagios-users@lists.sourceforge.net
  Subject: Re: [Nagios-users] Passive monitoring is running slow?
 
  On 01/05/07 05:15 PM, Jonathan Call wrote:
   I have set up a distributed monitoring system per the Nagios
  documentation.
  
   I initially tested it out by having the distributed server monitor
 only
  24 or so services on about 8 hosts. There didn't seem to be any
 problems.
  
   I then cranked it up to 427 services on 81 hosts. I'm watching the
  distributed server right now and there is hardly any system load but
 the
  Service Check Latency seems extremely high:
  
   MetricMin.Max.Average
   Check Execution Time: 0.05 sec1.67 sec0.701
 sec
   Check Latency:60.40 sec   287.36 sec  184.514
 sec
   Percent State Change: 0.00%   0.00%   0.00%
  
   This is resulting in 50% or less of the service checks completing
in
 the
  5 minutes or less timeframe.
  


 So this is a know design failure in Nagios then? I'm fairly new to

Absolutely not.

 Nagios and I am completely dumbfounded at this. If you can't service
 even a quarter (and probably even a tenth) of the amount of hosts and
 services on a distributed server than you can on a regular active
server
 then what is the point of having a distributed model at all?

I have 5 data collector machines running nagios 
-and- cricket for thousands of services each with nagios reporting all
results back to two central hosts as documented. Average latency is
0.689 seconds and Max of 3.65 seconds right now. The distributed server
should be performing exactly like a regular active server as far as
latency stats are concerned. You're either starving nagios for resources
needed to run its active checks (run ~nagios/bin/nagios -s
~nagios/etc/nagios.cfg to see recommended settings) or, less likely,
something is wrong with your submit-check-result. If you submit a result
from the command line, does it complete in a timely manner? If you
disable OCSP does the latency go away? Basic troubleshooting dictates
you should try methodically enabling features on your distributed
machine to turn it from an active-only server to active submitting check
results via OCSP.

Disable OCSP program-wide (nagios.cfg)
Test
Enable OCSP but have your OCSP script do everything except call
send_nsca
Test
Enable send_nsca in your OCSP script.
Test
...


Do you have regular host checks enabled? Post the output of nagios -v
and nagios -s.

--
Marc

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive monitoring is running slow?

2007-05-01 Thread Thomas Guyot-Sionnest
On 01/05/07 05:15 PM, Jonathan Call wrote:
 I have set up a distributed monitoring system per the Nagios documentation.
 
 I initially tested it out by having the distributed server monitor only 24 or 
 so services on about 8 hosts. There didn't seem to be any problems.
 
 I then cranked it up to 427 services on 81 hosts. I'm watching the 
 distributed server right now and there is hardly any system load but the 
 Service Check Latency seems extremely high:
 
 MetricMin.Max.Average
 Check Execution Time: 0.05 sec1.67 sec0.701 sec
 Check Latency:60.40 sec   287.36 sec  184.514 sec
 Percent State Change: 0.00%   0.00%   0.00%
 
 This is resulting in 50% or less of the service checks completing in the 5 
 minutes or less timeframe.
 
 The Central server has had no significant change in performance at all and 
 seems to be receiving and processing everything without difficulty.
 
 The nsca server on the central server is running with the following arguments:
 /usr/local/sbin/nsca --daemon -c /usr/local/etc/nsca.cfg
 
 The submit_check_result script on the distributed server is right out of the 
 documentation.

There are many ways to do that; my favorite (obviously since I wrote it
:) ) is using the host and service performance data files as named
pipes, and having a daemon reaping them and batch-sending data to
send_nsca..

The howto is here (and I'll be more than happy to answer your questions
or get your feedback):

http://www.nagioscommunity.org/wiki/index.php/OCP_Daemon

It will require Libevent and the Perl module Event::Lib.

Thomas

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null