[Nagios-users] Timeouts often false alarms and how to fix

2012-04-15 Thread Alex
Hi,

I have a few fc16 boxes and often times alerts are generated for
timeouts while connecting to a remote service on a client such as
clamd or smtpd:

PROCS CRITICAL: 0 processes with command name 'smtpd', UID = 89 (postfix)

Why is it that this happens? It doesn't happen all the time. I know
there are processes running with this command name, so not sure why
this alert would be generated.

It most often happens for me with smtpd and clamd. Is there a way to
extend the timeout, or enable some further debugging to troubleshoot
this?

Thanks,.
Alex

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Timeouts for send_nsca program

2007-10-09 Thread Wheeler, JF (Jonathan)
In /var/log/nagios/nagios.log on (at least) one of my slave servers, I
am seeing messages like:

[1191905959] Warning: OCSP command
'/usr/lib/nagios/plugins/tier1/submit_check_result.sh HOST
SERVICE_CHECK OK MESSAGE for service SERVICE NAME on host HOST
timed out after 5 seconds

There have been 712 occurrences today (so far).  Can anyone offer an
explanation ?  As far as I can tell there is no configuration to
increase the timeout limit (can it be increased by installing from
source ?), but perhaps the message indicates another problem (network ?)

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Timeouts for send_nsca program

2007-10-09 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Wheeler, JF (Jonathan)
 Sent: Tuesday, October 09, 2007 10:24 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] Timeouts for send_nsca program
 
 In /var/log/nagios/nagios.log on (at least) one of my slave servers, I
 am seeing messages like:
 
 [1191905959] Warning: OCSP command
 '/usr/lib/nagios/plugins/tier1/submit_check_result.sh HOST
 SERVICE_CHECK OK MESSAGE for service SERVICE NAME on host HOST
 timed out after 5 seconds
 
 There have been 712 occurrences today (so far).  Can anyone offer an
 explanation ?  

Something is causing a delay when your submit_check_result.sh script
presumably attempts to use send_nsca to send a small bit of data to
another host. Can you replicate it from the command line? You should be
able to.

 As far as I can tell there is no configuration to
 increase the timeout limit (can it be increased by installing from

ocsp_timeout in nagios.cfg.

 source ?), but perhaps the message indicates another problem (network
?)

Probably. If it's close to the documented submit_check_result, it's
sending only a small bit of data and should only take a fraction of a
second. I can use it to send data to two hosts in 0.011s. Maybe network
problems or load on either end are causing a delay in the data
transmission.

--
Marc

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Timeouts for send_nsca program

2007-10-09 Thread Anthony Montibello
 Make sure NSCA is still accepting the results from send_nsca.
to do this send over a result from the command line.
reviewing your submit_check_result.sh should give you the connection details
of the destination nsca,
and the send_nsca --help should assist with the commanad line,

This is assuming that it used to work, and would only detect that it is
currently broken.
but not why it broke nor how to fix it.
if it is intermittent, you can trial and error to find a good Timeout value
using the command line.

other people would be better at diagnosing the cause of the issue.

Tony (author of NC_Net)

On 10/9/07, Wheeler, JF (Jonathan) [EMAIL PROTECTED] wrote:

 In /var/log/nagios/nagios.log on (at least) one of my slave servers, I
 am seeing messages like:

 [1191905959] Warning: OCSP command
 '/usr/lib/nagios/plugins/tier1/submit_check_result.sh HOST
 SERVICE_CHECK OK MESSAGE for service SERVICE NAME on host HOST
 timed out after 5 seconds

 There have been 712 occurrences today (so far).  Can anyone offer an
 explanation ?  As far as I can tell there is no configuration to
 increase the timeout limit (can it be increased by installing from
 source ?), but perhaps the message indicates another problem (network ?)

 Jonathan Wheeler
 e-Science Centre
 Rutherford Appleton Laboratory

 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] timeouts when using secondary dns

2006-11-10 Thread Gerd Mueller
Hi,

I normaly configure hostaddresses by their ip-adresses. If it's not
possible I always use nscd as very very simple name server cache.

Cheers,

Gerd

 
Am Freitag, den 10.11.2006, 11:25 +1300 schrieb Steve Shipway:
 We dealt with this by installing a local caching-only nameserver on
 the Nagios host itself.  This also took a lot of the load off of the
 main nameservers.   So, resolv.conf was set to use 127.0.0.1 by
 default and have our normal name servers as secondaries.  A nice
 sideeffect was that it vastly sped up the name resolution.
  
 Steve
  
 --
 Steve Shipway
 ITSS, University of Auckland
 (09) 3737 599 x 86487
 [EMAIL PROTECTED]
 
 
 
  
 
 
 __
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
 Of stucky
 Sent: Friday, 10 November 2006 6:57 a.m.
 To: Az
 Cc: nagios
 Subject: Re: [Nagios-users] timeouts when using secondary dns
 
 
 
 Yey !! That totally did it. Thx AZ I hadn't even considered
 messing with the resolver cuz I was sure it was a nagios issue
 so I had to fix nagios.
 If that wasn't a text book example of how well mailinglists
 can work then I don't know what is... 
 
 thx
 
 On 11/7/06, Az [EMAIL PROTECTED] wrote: 
 stucky wrote:
  I use the check_by_ssh plugin for most of my stuff
 and I noticed that
  if the primary nameserver is unavailable nagios
 starts freaking out.
  All of a sudden all plugins time out. I tested it
 using the 'host' 
  command and it only takes about 1 second longer to
 lookup hosts using
  the secondary nameserver.
  The default timeout for check_by_ssh is 10 seconds.
 I cranked it up to
  30 and still I get timeouts. I'm not sure I
 understand that one. 
  Has anyone else seen this.
 We had a similar issue in that our primary DNS was
 doing strange things,
 and it quite often took 5 or even 10 seconds to
 perform a DNS lookup.
 What we were seeing was 70% of service checks (and
 subsequently host 
 checks) failing by timing out. The key was the
 multiple of 5 seconds.
 The resolver timeout on, say, RHEL3 is based on
 RES_TIMEOUT in
 resolv.h... which was 5 seconds.
 
 We added the following to our resolv.conf , and found
 the problems went away:
 
 options timeout:2 rotate
 
 This sets the timeout for waiting for a reply to 2
 seconds, and tells
 the resolve to rotate through your 'nameserver'
 entries rather than
 always hitting #1, then #2, etc.
 
 Cheers.
 
 
 
 
 
 
 
 -- 
 stucky 
 -
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___ Nagios-users mailing list 
 Nagios-users@lists.sourceforge.net 
 https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include 
 Nagios version, plugin version (-v) and OS when reporting any issue. ::: 
 Messages without supporting info will risk being sent to /dev/null

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] timeouts when using secondary dns

2006-11-09 Thread Steve Shipway



Wedealt withthis by installing a local 
caching-only nameserver on the Nagios host itself. This also took a lot of 
the load off of the main nameservers. So, resolv.conf was set to use 
127.0.0.1 by default and have our normal name servers as secondaries. A 
nice sideeffect was that it vastly sped up the name 
resolution.

Steve

--Steve ShipwayITSS, University of Auckland(09) 3737 
599 x 86487[EMAIL PROTECTED]


  
  
  From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  stuckySent: Friday, 10 November 2006 6:57 a.m.To: 
  AzCc: nagiosSubject: Re: [Nagios-users] timeouts when 
  using secondary dns
  Yey !! That totally did it. Thx AZ I hadn't even considered messing 
  with the resolver cuz I was sure it was a nagios issue so I had to fix 
  nagios.If that wasn't a text book example of how well mailinglists can 
  work then I don't know what is... thx
  On 11/7/06, Az 
  [EMAIL PROTECTED] wrote:
  stucky 
wrote: I use the check_by_ssh plugin for most of my stuff and I 
noticed that if the primary nameserver is unavailable nagios starts 
freaking out. All of a sudden all plugins time out. I tested it 
using the 'host'  command and it only takes about 1 second longer to 
lookup hosts using the secondary nameserver. The default 
timeout for check_by_ssh is 10 seconds. I cranked it up to 30 and 
still I get timeouts. I'm not sure I understand that one.  Has 
anyone else seen this.We had a similar issue in that our primary DNS was 
doing strange things,and it quite often took 5 or even 10 seconds to 
perform a DNS lookup.What we were seeing was 70% of service checks (and 
subsequently host checks) failing by timing out. The key was the 
multiple of 5 seconds.The resolver timeout on, say, RHEL3 is based on 
RES_TIMEOUT inresolv.h... which was 5 seconds.We added the 
following to our resolv.conf , and found the problems went 
away:options timeout:2 rotateThis 
sets the timeout for waiting for a reply to 2 seconds, and tellsthe 
resolve to rotate through your 'nameserver' entries rather thanalways 
hitting #1, then #2, 
  etc.Cheers.-- stucky 
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] timeouts when using secondary dns

2006-11-09 Thread Ton Voon
Hi!Just to let you know that I've made a change to CVS today, reported by Pawel Malachowski, where it looked like the plugins were making too many calls to resolver/DNS when the plugins were compiled with IPv6 options enabled.This should reduce the occasions of timeouts. However, I do like the idea of making the Nagios server a caching name server too...TonOn 9 Nov 2006, at 22:25, Steve Shipway wrote: We dealt with this by installing a local caching-only nameserver on the Nagios host itself.  This also took a lot of the load off of the main nameservers.   So, resolv.conf was set to use 127.0.0.1 by default and have our normal name servers as secondaries.  A nice sideeffect was that it vastly sped up the name resolution.   Steve  --Steve ShipwayITSS, University of Auckland(09) 3737 599 x 86487[EMAIL PROTECTED]     From:   [EMAIL PROTECTED]   [mailto:[EMAIL PROTECTED]] On Behalf Of   stuckySent: Friday, 10 November 2006 6:57 a.m.To:   AzCc: nagiosSubject: Re: [Nagios-users] timeouts when   using secondary dns  Yey !! That totally did it. Thx AZ I hadn't even considered messing   with the resolver cuz I was sure it was a nagios issue so I had to fix   nagios.If that wasn't a text book example of how well mailinglists can   work then I don't know what is... thx  On 11/7/06, Az   [EMAIL PROTECTED] wrote:  stucky wrote: I use the check_by_ssh plugin for most of my stuff and I noticed that if the primary nameserver is unavailable nagios starts freaking out. All of a sudden all plugins time out. I tested it using the 'host'  command and it only takes about 1 second longer to lookup hosts using the secondary nameserver. The default timeout for check_by_ssh is 10 seconds. I cranked it up to 30 and still I get timeouts. I'm not sure I understand that one.  Has anyone else seen this.We had a similar issue in that our primary DNS was doing strange things,and it quite often took 5 or even 10 seconds to perform a DNS lookup.What we were seeing was 70% of service checks (and subsequently host checks) failing by timing out. The key was the multiple of 5 seconds.The resolver timeout on, say, RHEL3 is based on RES_TIMEOUT inresolv.h... which was 5 seconds.We added the following to our resolv.conf , and found the problems went away:options timeout:2 rotateThis sets the timeout for waiting for a reply to 2 seconds, and tellsthe resolve to rotate through your 'nameserver' entries rather thanalways hitting #1, then #2,   etc.Cheers.-- stucky This message has been scanned for viruses by MailController.-Using Tomcat but need to do more? Need to support web services, security?Get stuff done quickly with pre-integrated technology to make your job easierDownload IBM WebSphere Application Server v.1.0.1 based on Apache Geronimohttp://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___Nagios-users mailing listNagios-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/nagios-users::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null  http://www.altinity.comT: +44 (0)870 787 9243F: +44 (0)845 280 1725Skype: tonvoon -
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] timeouts when using secondary dns

2006-11-07 Thread Az
stucky wrote:
 I use the check_by_ssh plugin for most of my stuff and I noticed that 
 if the primary nameserver is unavailable nagios starts freaking out.
 All of a sudden all plugins time out. I tested it using the 'host' 
 command and it only takes about 1 second longer to lookup hosts using 
 the secondary nameserver.
 The default timeout for check_by_ssh is 10 seconds. I cranked it up to 
 30 and still I get timeouts. I'm not sure I understand that one.
 Has anyone else seen this.
We had a similar issue in that our primary DNS was doing strange things, 
and it quite often took 5 or even 10 seconds to perform a DNS lookup. 
What we were seeing was 70% of service checks (and subsequently host 
checks) failing by timing out. The key was the multiple of 5 seconds. 
The resolver timeout on, say, RHEL3 is based on RES_TIMEOUT in 
resolv.h... which was 5 seconds.

We added the following to our resolv.conf, and found the problems went away:

options timeout:2 rotate

This sets the timeout for waiting for a reply to 2 seconds, and tells 
the resolve to rotate through your 'nameserver' entries rather than 
always hitting #1, then #2, etc.

Cheers.





-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] timeouts when using secondary dns

2006-11-06 Thread stucky
GuysI use the check_by_ssh plugin for most of my stuff and I noticed that if the primary nameserver is unavailable nagios starts freaking out.All of a sudden all plugins time out. I tested it using the 'host' command and it only takes about 1 second longer to lookup hosts using the secondary nameserver.
The default timeout for check_by_ssh is 10 seconds. I cranked it up to 30 and still I get timeouts. I'm not sure I understand that one.Has anyone else seen this.-- stucky
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] timeouts when using secondary dns

2006-11-06 Thread Hugo van der Kooij
On Mon, 6 Nov 2006, stucky wrote:

 Guys
 
 I use the check_by_ssh plugin for most of my stuff and I noticed that if the
 primary nameserver is unavailable nagios starts freaking out.
 All of a sudden all plugins time out. I tested it using the 'host' command
 and it only takes about 1 second longer to lookup hosts using the secondary
 nameserver.
 The default timeout for check_by_ssh is 10 seconds. I cranked it up to 30
 and still I get timeouts. I'm not sure I understand that one.

You can try yo add the nagios host to the host file of one of the servers 
you are monitoring to see which end suffers the most from a freaked out 
DNS.

Hugo.

-- 
[EMAIL PROTECTED]   http://hvdkooij.xs4all.nl/
This message is using 100% recycled electrons.

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] timeouts and performance info

2006-08-30 Thread Tobias Klausmann
Hi!

I have the following values in my nagios.cfg:

service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

As far as I know, those values are in seconds. What I wonder is
why I still have Service and Host Checks that take longer than
fifteen minutes to complete. This shouldn't be the case the way I
under stand it. Here's my curren perf info:

Active Service Checks:
= 1 minute:81 (4.6%)
= 5 minutes:   1719 (97.4%)
= 15 minutes:  1727 (97.9%)
= 1 hour:  1727 (97.9%)
Since program start:1727 (97.9%)

and 

Check Execution Time:   0.00 sec12.92 sec   0.275 sec
Check Latency:  0.00 sec204.30 sec  3.043 sec
Percent State Change:   0.00%   15.46%  0.02%

Active Hosts Checks:
= 1 minute:0 (0.0%)
= 5 minutes:   3 (1.2%)
= 15 minutes:  3 (1.2%)
= 1 hour:  4 (1.6%)
Since program start:27 (10.8%)

and

Check Execution Time:   0.02 sec10.05 sec   0.208 sec
Check Latency:  0.00 sec17.48 sec   0.204 sec
Percent State Change:   0.00%   0.00%   0.00%

Am I the only one seeing a discrepancy here?

The only way I can make sense of this is that the = 15 minutes
means time from being scheduled to actually starting the
plugin. In that case I wonder what makes it take so long, the
machine should be beefy neough (dual PIV Xeon 2.8Ghz, 2G of RAM).

Any hints/thoughts are appreciated.

Regards, 
Tobias

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] timeouts and performance info

2006-08-30 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Tobias Klausmann
 Sent: Wednesday, August 30, 2006 2:55 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] timeouts and performance info
 
 Hi!
 
 I have the following values in my nagios.cfg:
 
 service_check_timeout=60
 host_check_timeout=30
 event_handler_timeout=30
 notification_timeout=30
 ocsp_timeout=5
 perfdata_timeout=5
 
 As far as I know, those values are in seconds. What I wonder is
 why I still have Service and Host Checks that take longer than
 fifteen minutes to complete. This shouldn't be the case the way I
 under stand it. Here's my curren perf info:

The timeouts above apply from when a particular plugin starts to when it
completes (check execution time). As noted below, this time on average
for you is 12.92 seconds. They don't affect when a plugin is scheduled
to run.
 
 Active Service Checks:
 = 1 minute:  81 (4.6%)
 = 5 minutes: 1719 (97.4%)
 = 15 minutes:1727 (97.9%)
 = 1 hour:1727 (97.9%)
 Since program start:  1727 (97.9%)

This seems mostly normal for a 5 minute check_interval. The small
difference between the 5 and 15 minute counts is normal as checks may be
just starting to execute or still in progress at the 5 minute mark. It
does appear that you have some number of services that are not scheduled
for execution or are executing at really long intervals. Look at Service
Detail and sort by last check. Re-examine your configuration for those
services that do not appear to be scheduled properly.
 
 and
 
 Check Execution Time: 0.00 sec12.92 sec   0.275
sec
 Check Latency:0.00 sec204.30 sec  3.043
sec
 Percent State Change: 0.00%   15.46%  0.02%

Looks pretty good to me. The high max check latency number may have been
a one-off event. If that number regularly changes and is always very
high then you might want to verify that you're not starving nagios for
check by running /path/to/nagios/bin/nagios -s
/path/to/nagios/etc/nagios and make sure you meet or exceed it's
recommended values.

 
 Active Hosts Checks:
 = 1 minute:  0 (0.0%)
 = 5 minutes: 3 (1.2%)
 = 15 minutes:3 (1.2%)
 = 1 hour:4 (1.6%)
 Since program start:  27 (10.8%)
 
 and
 
 Check Execution Time: 0.02 sec10.05 sec   0.208
sec
 Check Latency:0.00 sec17.48 sec   0.204
sec
 Percent State Change: 0.00%   0.00%   0.00%

These look normal and expected. You've had 27 service failures since
program start necessitating host checks.
 
 Am I the only one seeing a discrepancy here?

The only discrepancy I see is likely due to configuration. You probably
have check intervals or timeperiods misconfigured for ~30 services.
 
 The only way I can make sense of this is that the = 15 minutes
 means time from being scheduled to actually starting the
 plugin. In that case I wonder what makes it take so long, the

Check Latency is that number. On average nagios is able to run your
checks within 3.043 seconds of when they are scheduled to run. The
number you are referring to is just a simple count of the number of
plugins that have been run in that time interval.

--
Marc


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] timeouts and performance info

2006-08-30 Thread Tobias Klausmann
Hi! 

On Wed, 30 Aug 2006, Marc Powell wrote:
  Active Service Checks:
  = 1 minute:81 (4.6%)
  = 5 minutes:   1719 (97.4%)
  = 15 minutes:  1727 (97.9%)
  = 1 hour:  1727 (97.9%)
  Since program start:1727 (97.9%)
 
 This seems mostly normal for a 5 minute check_interval. The small
 difference between the 5 and 15 minute counts is normal as checks may be
 just starting to execute or still in progress at the 5 minute mark. It
 does appear that you have some number of services that are not scheduled
 for execution or are executing at really long intervals. Look at Service
 Detail and sort by last check. Re-examine your configuration for those
 services that do not appear to be scheduled properly.

I have a few services that are disabled entirely (don't check
actively, don't accept passive checks). Would they count in the
above statistic? They seem to fit in with the missing 2.1%
(100-97.9). Also, I saw a few checks that were last run about ~20
minutes ago. Those are log checks via NRPE that complete within
1s (no noticeable delay) if run directly on the machine (as user
nagios of course). It seems acceptable (and I neither know why it
would take 20 minutes nor how to find out why), so I'm willing to
let it slide ;).

 Looks pretty good to me. The high max check latency number may have been
 a one-off event. If that number regularly changes and is always very
 high then you might want to verify that you're not starving nagios for
 check by running /path/to/nagios/bin/nagios -s
 /path/to/nagios/etc/nagios and make sure you meet or exceed it's
 recommended values.

I guessed as much for the one-off event. It doesn't change, so I
feel somewhat safe. As for the recommended values (-s), Nagios
says it's okay the way it is.

  Active Hosts Checks:
  = 1 minute:0 (0.0%)
  = 5 minutes:   3 (1.2%)
  = 15 minutes:  3 (1.2%)
  = 1 hour:  4 (1.6%)
  Since program start:27 (10.8%)
  
  and
  
  Check Execution Time:   0.02 sec10.05 sec   0.208
 sec
  Check Latency:  0.00 sec17.48 sec   0.204
 sec
  Percent State Change:   0.00%   0.00%   0.00%
 
 These look normal and expected. You've had 27 service failures since
 program start necessitating host checks.

That is in line with what I'd expect.

  Am I the only one seeing a discrepancy here?
 
 The only discrepancy I see is likely due to configuration. You probably
 have check intervals or timeperiods misconfigured for ~30 services.

About that number of services are disabled entirely right now, so
if they count into the statistic, it explains the figures.

  The only way I can make sense of this is that the = 15 minutes
  means time from being scheduled to actually starting the
  plugin. In that case I wonder what makes it take so long, the
 
 Check Latency is that number. On average nagios is able to run your
 checks within 3.043 seconds of when they are scheduled to run. The
 number you are referring to is just a simple count of the number of
 plugins that have been run in that time interval.

So it means in the last N minutes, this many services completed
and *not* this many services needed N minutes to complete (from
being started to delivering the retval)? That would be an eye
opener for me :)

Regards  Thanks,
Tobias
-- 
You don't need eyes to see, you need vision.

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] timeouts and performance info

2006-08-30 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Tobias Klausmann
 Sent: Wednesday, August 30, 2006 8:44 AM
 To: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] timeouts and performance info
 
 Hi!
 
 On Wed, 30 Aug 2006, Marc Powell wrote:
   Active Service Checks:
   = 1 minute:  81 (4.6%)
   = 5 minutes: 1719 (97.4%)
   = 15 minutes:1727 (97.9%)
   = 1 hour:1727 (97.9%)
   Since program start:  1727 (97.9%)
 


 I have a few services that are disabled entirely (don't check
 actively, don't accept passive checks). Would they count in the
 above statistic? They seem to fit in with the missing 2.1%
 (100-97.9). Also, I saw a few checks that were last run about ~20

Without reviewing the code, that is what I expect to be the case.
 
 
  Check Latency is that number. On average nagios is able to run your
  checks within 3.043 seconds of when they are scheduled to run. The
  number you are referring to is just a simple count of the number of
  plugins that have been run in that time interval.
 
 So it means in the last N minutes, this many services completed
 and *not* this many services needed N minutes to complete (from
 being started to delivering the retval)? That would be an eye
 opener for me :)

That is a correct interpretation.

--
Marc

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Timeouts

2006-08-23 Thread Dirk H. Schulz
Hi folks,

I have a problem concerning timeouts.

First the basics: I run Nagios 2.3.1 on Debian Sarge stable.

I have configured service_check_timeout=60, but in certain 
circumstances (e.g. slow dns) I get the erorr: Plugin timed out after 
10 seconds or Socket timed out after 10 seconds.

Is there another timeout value I have to configure to get rid of this 10 
seconds threshold?

I know that I should work on my dns first, but I want to understand what 
decisions Nagios makes there.

Any hint or help is appreciated.

Dirk



-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Timeouts

2006-08-23 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of Dirk H. Schulz
 Sent: Wednesday, August 23, 2006 4:43 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] Timeouts
 
 Hi folks,
 
 I have a problem concerning timeouts.
 
 First the basics: I run Nagios 2.3.1 on Debian Sarge stable.
 
 I have configured service_check_timeout=60, but in certain
 circumstances (e.g. slow dns) I get the erorr: Plugin timed out after
 10 seconds or Socket timed out after 10 seconds.
 
 Is there another timeout value I have to configure to get rid of this
10
 seconds threshold?

Yes, The service_check_timout in nagios.cfg is a last-resort timeout. If
a plugin hasn't terminated itself in that period of time then nagios
will kill it. All of the standard plugins (I believe) can be passed a
timeout value in their command line, usually via -t. If none is passed
they'll use whatever value is hard coded (usually 10 seconds). You can
use '--help' for the plugins you use to see the timeout parameters.
 
 I know that I should work on my dns first, but I want to understand
what
 decisions Nagios makes there.

Or use IP's instead of names so you don't rely on an external service
that can possibly fail. ;)

--
marc

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Timeouts

2006-07-12 Thread Sandeep Narasimha Murthy
Hi,

I have a question regarding the various timeout variables.

The service_check_timeout variable in Nagios.cfg hás a value of 99, the command 
that invokes the remote plugin hás the -t flag value as 170  and the remote 
plugin hás a timeout value of 130 s.

Whats happening is that Nagios return a CRITICAL: Service Check Timed Out error 
when the remote plugin sometimes exceeds 120 s.  My question is which of the 
timeouts value is being enforced ??

I assume its the 99 s defined in the Nagios.cfg BUT if that is so, how does the 
plugin log indicates that it executed fine untill 120 s 

I hope the above was clear !

Thanks in adv,

sg


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null