Re: [Nagios-users] Sending SMS from Nagios

2007-05-11 Thread Tobias Scherbaum
Josh Kessler wrote:
 The easiest way to send SMS pages is to use the cell phone company's
 email address's for the numbers. I know verizon is [EMAIL PROTECTED] .
 That way you don't have to worry about yet another add-on. There are
 places to look up all the company's listings, I just don't have it in
 front of me. Quick and simple SMSing. 
 -Josh

Well, that's another possible point of failure. Not getting
notifications about an internet connection not being available _because_
the internet connection isn't available sucks :P

wkr,
  Tobias


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson
Hi,

I have recently set up Nagios 2.8 and am monitoring 1623 hosts and 1946 
services.  Performance under normal circumstances is fine.  Typical 
check and latency times are as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.418 sec
Service Check Latency:   0.00   / 1.87/ 0.479 sec
Host Check Execution Time:   0.03   / 10.04   / 0.843 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

The vast majority of these hosts are spread over 320 geographic 
locations throughout the UK.  These locations are connected to our data 
centre via a hardware VPN device with the majority (about 270) using a 
private ADSL circuit to facilitate the VPN connection.

Yesterday, we had a major outage caused by the failure of one of the 
ADSL central routers at our ISP.  This took out a third of our ADSL 
sites (roughly 90) for 16 minutes.  Each of these sites has about 4 
devices monitored by Nagios so in effect about 360 devices (hosts) went 
down in an instant.

As you can imagine, we were aware of the problem almost immediately due 
to the barrage of phone calls from out clients, but unfortunately Nagios 
didn't even remotely reflect the current situation.  I have used parent 
child relationships to the full so I was expecting a good portion of the 
VPN devices to show as down with all other devices behind the VPN device 
showing as unreachable.  This was not the case.  It actually took half 
an hour to find only 20 of these VPN devices down and another half an 
hour to notice that they were actually back up again having only noticed 
20 of the 90 in the first place.  During the outage, the service check 
latency was increasing exponentially and the performance stats half an 
hour after the start of the problem were as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.646 sec
Service Check Latency:   947.84 / 2080.05 / 1467.274 sec
Host Check Execution Time:   0.03   / 10.04   / 0.968 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

As you can see, the average service check latency time has jumped to 
1467 seconds (24 mins).  On all of these hosts there is only one service 
which is a ping (check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5).  
The host check is also a ping (check_ping -H $HOSTADDRESS$ -w 3000.0,80% 
-c 5000.0,100% -p 1) but much faster with only 1 ping being sent out.  
The normal_check_interval on services is 5 mins with 2 
max_check_attempts and a retry_interval of 1.   The host also has a 
max_check_attempts of 2.

A lot of people have mentioned using fping to speed things up but if my 
average service latency is only 0.479 seconds in normal circumstances, I 
can't see how tweaking this will help in a major outage situation.

I have also read through the section on tweaking performance which seems 
to be geared toward protecting the machine Nagios is running on.  I want 
to do the opposite and give Nagios a lot more work to do.  The machine 
is dedicated to Nagios and is quite high spec.  It's an IBM xServies 336 
with 2 Dual Core processors and 4GB of RAM so it should be able to take 
a much bigger hit.  I have been monitoring CPU performance with MRTG and 
the CPU performance never goes lower than 90% idle.  Ironically during 
the problem, the machines idle time jumped to 95% when I would have 
expected to drop rather than increase.

The only performance tweak I could see that would affect the performance 
in this situation is max_concurrent_checks but this is already set to 0.

I am fairly new to Nagios (2 months) so I apologise if I have missed 
something obvious but any pointers to a solution to this problem would 
be greatly appreciated.  I have run a nagios -s (attached below) which 
seems to indicate that everything is setup ok.  Let me know if you 
require any more information from my config that would help diagnose the 
problem.

regards,
Aidan




Nagios 2.8
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 03-08-2007
License: GPL

Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---
Total hosts: 1624
Total scheduled hosts:   0
Host inter-check delay method:   SMART
Average host check interval: 0.00 sec
Host inter-check delay:  0.00 sec
Max host check spread:   30 min
First scheduled check:   N/A
Last scheduled check:N/A


SERVICE SCHEDULING INFORMATION
---
Total services: 1947
Total scheduled services:   1947
Service inter-check delay method:   SMART
Average 

Re: [Nagios-users] Failed to Send Notification

2007-05-11 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of adi yesaya
 Sent: Friday, May 11, 2007 6:02 AM
 To: Nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] Failed to Send Notification
 
 Dear Nagios-ers,
 
 I tried to send email notifications, some succeed, some not. The ones
 which failed, had this error message:
 
- Transcript of session follows
---
 -
 
 ... while talking to fallback.nl.uu.net.:
 
  550-Verification failed for [EMAIL PROTECTED]
 
  550-Domain localhost.localdomain not found in DNS

chop

Fallback.nl.uu.net is rejecting the e-mail because the MTA on your
nagios box, not nagios itself, is claiming to be called
localhost.localdomain, which is an unverifiable name. It's very
reasonable that they would do that. You need to configure the MTA on
your nagios box to use a real hostname that can be found in the DNS.
This may be as simple as adding the hostname to your DNS and editing
your /etc/hosts file on your nagios box, changing the 127.0.0.1 entry to
something like --

127.0.0.1 yourhost.yourdomain.foo localhost.localdomain

You'd need to restart your MTA if it's running as a daemon after making
that change. If that doesn't work, you'll need to specifically configure
the MTA that you're using to masquerade as that host or domain. You can
do this for Sendmail by setting the MASQUERADE_AS macro. Under Postfix I
believe it's the myorigin variable.

--
Marc

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Failed to Send Notification

2007-05-11 Thread adi yesaya




chop

Fallback.nl.uu.net is rejecting the e-mail because the MTA on your
nagios box, not nagios itself, is claiming to be called
localhost.localdomain, which is an unverifiable name. It's very
reasonable that they would do that. You need to configure the MTA on
your nagios box to use a real hostname that can be found in the DNS.
This may be as simple as adding the hostname to your DNS and editing
your /etc/hosts file on your nagios box, changing the 127.0.0.1 entry to
something like --

127.0.0.1 yourhost.yourdomain.foo localhost.localdomain

You'd need to restart your MTA if it's running as a daemon after making
that change. If that doesn't work, you'll need to specifically configure
the MTA that you're using to masquerade as that host or domain. You can
do this for Sendmail by setting the MASQUERADE_AS macro. Under Postfix I
believe it's the myorigin variable.

--
Marc




How can I know what MTA I am using?
At the nagios command.cfg, the email-notification is send by using /bin/mail
,
but is /bin/mail a MTA or just to send email?
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] NRPE: Unable to read output

2007-05-11 Thread Richard Solid

Hello,

I'm trying to use the NRPE package to monitor the resources of remote
computers.

I'm using Fedora Core 3 as one of my clients with the following packages:

nagios-plugins-nrpe-2.5.2-1.fc3.rf
nagios-nrpe-2.5.2-1.fc3.rf

When I test from this fedora client machine I get the following error:

[EMAIL PROTECTED] ./check_nrpe -H mydomain.org  -c check_load

NRPE: Unable to read output

This is the content of the /etc/xinetd.d/nrpe file:

service nrpe
{
   flags   = REUSE
   type= UNLISTED
   port= 5666
   socket_type = stream
   wait= no
   user= nagios
   group   = nagios
   server  = /usr/sbin/nrpe
   server_args = -c /etc/nagios/nrpe.cfg --inetd
   log_on_failure  += USERID
   disable = no
   only_from   = x.x.x.x
}


The x.x.x.x isthe IP of the client machine since I'm doing testing. I will
change it later to the monitor machine that runs nagios.

This is the content of the /etc/nagios/nrpe.cfg



pid_file=/var/run/nrpe.pid

server_port=5666

nrpe_user=nagios

nrpe_group=nagios

dont_blame_nrpe=0

debug=1

command_timeout=60

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c
30,25,20
command[check_disk1]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p
/dev/hda1
command[check_disk2]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p
/dev/hdb1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10
-s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200

This is the content of the /var/log/secure log:

May 11 10:57:48 hostname xinetd[29902]: START: nrpe pid=30509 from=x.x.x.x

This is what I get when restarting the nrpe service:

Shutting down Nagios NRPE daemon (nrpe):   [FAILED]
Starting Nagios NRPE daemon (nrpe):[  OK  ]

This is what the /var/log/messages log is saying when trying to restart the
nrpe service:

May 11 11:05:06 hostname nrpe: nrpe shutdown failed
May 11 11:05:06 hostname nrpe[30554]: INFO: SSL/TLS initialized. All network
traffic will be encrypted.
May 11 11:05:06 hostname nrpe[30555]: Starting up daemon
May 11 11:05:06 hostname nrpe[30555]: Network server bind failure (98:
Address already in use)
May 11 11:05:06 hostname nrpe: nrpe startup succeeded

When I do a  ps -aux | grep nrpe I dont see the deamon by the name nrpe
running but when I do a ps for xinetd, xinetd is running

I installed this package on othermachines I I get the same problem.

Also, the only plugin listed under /usr/lib/nagios/plugins/ is check_nrpe

Do i need to see other plugins listed like check_load or check_disk. Or
these are only switches that are used by check_nrpe?

If this is a permission issue which I dont think so because when installing
these packes they should take care of the permissions and ownership, can
someone provide me with all the paths and permissions of each file, how they
should look like?

Any inputs?

THANKS!
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Strange Return code Error

2007-05-11 Thread De Wetenschapper

nagios-users@lists.sourceforge.netHi Guys,

I just installed  configured nagios on a RHEL4 U4 64bit 2.6.9-34.ELsmp
[EMAIL PROTECTED] dist]# rpm -qa | grep nagios
nagios-2.9-1.el4.rf
nagios-plugins-1.4.8-2.el4.rf
nagios-plugins-nrpe-2.5.2-1.el4.rf
nagios-nrpe-2.5.2-1.el4.rf

And I get a Return code of 127 for check of service 'n-users' on host
'fisdb01' was out of bounds Message on all services on all hosts.
I saw on http://www.nagios.org/faqs/viewfaq.php?faq_id=17 that this was most
likely a problem with the path to the executable.
but if that was so I wouldn't be able to do this:
[EMAIL PROTECTED] services]# /usr/lib64/nagios/plugins/check_nrpe -H localhost
-c check_users
USERS OK - 3 users currently logged in |users=3;5;10;0
which proves it executes and in the right place. Right?

some more info:

[EMAIL PROTECTED] services]# ls -l /usr/lib64/nagios/plugins/check_users
-rwxr-xr-x  1 root root 32138 Apr 20 22:06
/usr/lib64/nagios/plugins/check_users

[EMAIL PROTECTED] services]# ls -l /etc/nagios/nrpe.cfg
-rw-r--r--  1 root root 8514 May 11 10:44 /etc/nagios/nrpe.cfg

[EMAIL PROTECTED] services]# cat /etc/nagios/nrpe.cfg | grep users
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
#command[check_users]=/usr/lib64/nagios/plugins/check_users -w $ARG1$ -c
$ARG2$
command[unix_users]=/usr/bin/perl /usr/lib64/cactiscripts/unix_users.pl
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[unix_users]=/usr/bin/perl /usr/lib64/cactiscripts/unix_users.pl


nagios.log
[1178875845] Nagios 2.9 starting... (PID=29283)
[1178875845] LOG VERSION: 2.0
[1178875845] Warning: Duplicate definition found for host 'generic-host'
(config file '/etc/nagios/hosts.cfg', starting on line 21)
[1178875845] Warning: Duplicate definition found for service
'generic-service' (config file '/etc/nagios/services/services.cfg', starting
on line 2)
[1178875845] Finished daemonizing... (New PID=29284)
[1178876155] Warning: Return code of 127 for check of service 'n-users' on
host 'fisdb01' was out of bounds. Make sure the plugin you're trying to run
actually exists.
[1178876305] Warning: Return code of 127 for check of service 'n-users' on
host 'fisdev01' was out of bounds. Make sure the plugin you're trying to run
actually exists.


I've spend many hours on this error.
So if someone can point me in the right direction?

Thanks in advance,
Jan Lenaerts
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Strange Return code Error

2007-05-11 Thread Morris, Patrick

 I just installed  configured nagios on a RHEL4 U4 64bit 
 2.6.9-34.ELsmp
 [EMAIL PROTECTED] dist]# rpm -qa | grep nagios 
 nagios-2.9-1.el4.rf nagios-plugins-1.4.8-2.el4.rf 
 nagios-plugins-nrpe-2.5.2-1.el4.rf
 nagios-nrpe-2.5.2-1.el4.rf
 
 And I get a Return code of 127 for check of service 
 'n-users' on host 'fisdb01' was out of bounds Message on all 
 services on all hosts. 
 I saw on http://www.nagios.org/faqs/viewfaq.php?faq_id=17 
 that this was most likely a problem with the path to the executable.
 but if that was so I wouldn't be able to do this: 
 [EMAIL PROTECTED] services]# /usr/lib64/nagios/plugins/check_nrpe 
 -H localhost -c check_users USERS OK - 3 users currently 
 logged in |users=3;5;10;0 which proves it executes and in the 
 right place. Right?

No, it doesn't.  Always run your tests as the Nagios user.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Failed to Send Notification

2007-05-11 Thread Morris, Patrick
 How can I know what MTA I am using?
 At the nagios command.cfg, the email-notification is send by 
 using /bin/mail , but is /bin/mail a MTA or just to send email?

Check your mail log.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_http plugin and a proxy?

2007-05-11 Thread Marc Powell


 -Original Message-
 From: Bill Jacqmein [mailto:[EMAIL PROTECTED]
 Sent: Monday, April 23, 2007 7:55 AM
 To: Marc Powell
 Cc: nagios-users@lists.sourceforge.net
 Subject: Re: [Nagios-users] check_http plugin and a proxy?
 
 The only problem Ive found with this is the https component. It doesnt
 appear to speak connect when talking to the proxy.

That's correct. Fortunately for the OP, he wasn't asking for that
capability.

--
Marc

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] nrpe command line test question

2007-05-11 Thread Maxwell,Brady
My nrpe.cfg on the remote host contains these commands

command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c
$ARG2$ -p $ARG3$

command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p
/dev/vga/root

 

 

running a check_nrpe from the command line has the following results.

 

[EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c
check_disk -a 10 5 /dev/vga/root

check_disk: Warning threshold must be integer or percentage!

 

 

 

[EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c
check_disk1

DISK OK - free space: / 801 MB (12% inode=81%);|
/=5625MB;6405;6415;80;6425

 

 

I would like to be able to pass arguments to the remote system, allowing
me to set threshold values at the service level.

 

Can anyone tell me why I get the error Warning threshold must be
integer or percentage! ?

Or suggest another method of passing the args to the remote nrpe
process?

 

 

Thanks

Brady

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nrpe command line test question

2007-05-11 Thread Aidan Anderson
Maxwell,Brady wrote:

 My nrpe.cfg on the remote host contains these commands

 command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c 
 $ARG2$ -p $ARG3$

 command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 
 -p /dev/vga/root

 running a check_nrpe from the command line has the following results.

 [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
 check_disk –a 10 5 /dev/vga/root

 check_disk: Warning threshold must be integer or percentage!

 [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
 check_disk1

 DISK OK - free space: / 801 MB (12% inode=81%);| 
 /=5625MB;6405;6415;80;6425

 I would like to be able to pass arguments to the remote system, 
 allowing me to set threshold values at the service level.

 Can anyone tell me why I get the error “Warning threshold must be 
 integer or percentage!” ?

 Or suggest another method of passing the args to the remote nrpe process?

 Thanks

 Brady

 

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
Make sure that you set dont_blame_nrpe to 1 in nrpe.cfg to allow nrpe to 
accept client arguments. This is set to 0 by default as it is deemed a 
security risk

Aidan



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Question about Freshness Checking

2007-05-11 Thread Jeff Shumard - DefenseWeb Technologies
I am running a Distributed Nagios configuration.  On each of my passive
service checks I am also doing freshness checks just encase the
distributed host goes down and can't run the check.  I am able to log
into the distributed hosts Web Interface and shut off active checks if I
don't want to run checks for a temporary amount of time on a specific
hosts and it is service with one click to disable active checks for all
services.  This works with out any problems but once my freshness checks
is hit the Centralized Nagios hosts starts doing the active checks
because it doesn't receive an update from the Distributed Hosts.  I am
aware this is what should be happening and it is working great.  Is
there a way to disable the freshness check for all the services for a
host just like you can for active checks?  I know if I shut off
receiving passive checks for one service this disables the freshness
checks.  Has someone configured a patch or know how to activate this
feature to disable passive checks for all services on a host through the
Nagios cgi.

Jeff

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Jim Avery
On 11/05/07, Aidan Anderson [EMAIL PROTECTED] wrote:

 A lot of people have mentioned using fping to speed things up but if my
 average service latency is only 0.479 seconds in normal circumstances, I
 can't see how tweaking this will help in a major outage situation.

check_ping won't finish until it's done all the pings, and the pings
are (if I recall) always at one second intervals.  This means that if
you've configured check_ping to do (let's say) 5 pings, the check_ping
plugin will always take at least 5 seconds to complete.

If the check_ping is being run as a host check rather than a service
check, my understanding is that this is the only thing Nagios will be
doing; it doesn't do anything else concurrently (correct me if I'm
wrong people).

In normal operation, nagios will rarely do a host check, as it only
usually bothers to if all of the service checks (which can run
concurrently) for that host have failed.  When lots of hosts go down
at once, you suddenly notice how bad it is to have such slow host
checks.

check_icmp or check_fping typically complete a whole lot quicker than
check_ping.  This is because (if I recall correctly) they will finish
and return an OK status as soon as they receive the first ping
response rather than bothering to do all 5 of them.

My nagios system used to crawl even if only half a dozen hosts were
down until I changed check_ping to check_fping (and now I use
check_icmp but I can't remember if it's any better than check_fping or
not).

hth,

Jim

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Strange Return code Error

2007-05-11 Thread Marc Powell


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:nagios-users-
 [EMAIL PROTECTED] On Behalf Of De Wetenschapper
 Sent: Friday, May 11, 2007 10:19 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] Strange Return code Error
 
 mailto:nagios-users@lists.sourceforge.net Hi Guys,
 
 I just installed  configured nagios on a RHEL4 U4 64bit
2.6.9-34.ELsmp
 [EMAIL PROTECTED] dist]# rpm -qa | grep nagios
 nagios-2.9-1.el4.rf
 nagios-plugins-1.4.8-2.el4.rf
 nagios-plugins-nrpe-2.5.2-1.el4.rf
 nagios-nrpe-2.5.2-1.el4.rf
 
 And I get a Return code of 127 for check of service 'n-users' on host
 'fisdb01' was out of bounds Message on all services on all hosts.
 I saw on http://www.nagios.org/faqs/viewfaq.php?faq_id=17 that this
was
 most likely a problem with the path to the executable.
 but if that was so I wouldn't be able to do this:
 [EMAIL PROTECTED] services]# /usr/lib64/nagios/plugins/check_nrpe -H
localhost
 -c check_users
 USERS OK - 3 users currently logged in |users=3;5;10;0
 which proves it executes and in the right place. Right?

Not at all. It just means that root can run a program successfully when
called directly. The question you should be asking is does the command{}
definition for this plugin use that full path? The default configs
typically use the $USER1$ macro for the path which must be properly set
in resource.cfg. Is $USER1$ pointing to the correct path? Is
resource.cfg being loaded by nagios?

Also, as pointed out previously, the plugins are never run as the root
user, always the nagios user. While it's not likely in this case, the
nagios user certainly isn't guaranteed the same access privileges as the
root user. You should always perform plugin tests as the nagios user.

--
Marc

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Ton Voon

On 11 May 2007, at 19:03, Jim Avery wrote:

 On 11/05/07, Aidan Anderson [EMAIL PROTECTED] wrote:

 A lot of people have mentioned using fping to speed things up but  
 if my
 average service latency is only 0.479 seconds in normal  
 circumstances, I
 can't see how tweaking this will help in a major outage situation.

 check_ping won't finish until it's done all the pings, and the pings
 are (if I recall) always at one second intervals.  This means that if
 you've configured check_ping to do (let's say) 5 pings, the check_ping
 plugin will always take at least 5 seconds to complete.

 If the check_ping is being run as a host check rather than a service
 check, my understanding is that this is the only thing Nagios will be
 doing; it doesn't do anything else concurrently (correct me if I'm
 wrong people).

Correct. We noticed this some time ago too: http://altinity.blogs.com/ 
dotorg/2006/05/immediate_perfo.html

If you do stick to using check_ping, use -p 1 which is sub second  
response time.


 In normal operation, nagios will rarely do a host check, as it only
 usually bothers to if all of the service checks (which can run
 concurrently) for that host have failed.  When lots of hosts go down
 at once, you suddenly notice how bad it is to have such slow host
 checks.

Nagios 3 will do parallelised host checks, so there will not be a  
slow down there.

Also, Ethan said in his presentation at the Netways conference last  
year that some of the host unreachable logic was not quite right:  
http://www.netways.de/uploads/media/Ethan.Galstad_Nagios. 
3.and.Beyond.pdf

This should be fixed in Nagios 3.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson

Ton Voon wrote:
 On 11 May 2007, at 19:03, Jim Avery wrote:

   
 On 11/05/07, Aidan Anderson [EMAIL PROTECTED] wrote:

 
 A lot of people have mentioned using fping to speed things up but  
 if my
 average service latency is only 0.479 seconds in normal  
 circumstances, I
 can't see how tweaking this will help in a major outage situation.
   
 check_ping won't finish until it's done all the pings, and the pings
 are (if I recall) always at one second intervals.  This means that if
 you've configured check_ping to do (let's say) 5 pings, the check_ping
 plugin will always take at least 5 seconds to complete.

 If the check_ping is being run as a host check rather than a service
 check, my understanding is that this is the only thing Nagios will be
 doing; it doesn't do anything else concurrently (correct me if I'm
 wrong people).
 

 Correct. We noticed this some time ago too: http://altinity.blogs.com/ 
 dotorg/2006/05/immediate_perfo.html

 If you do stick to using check_ping, use -p 1 which is sub second  
 response time.

   
First of all, thank-you for the replies!

The majority of devices that I monitor are routers/vpn devices and I 
have (on the documentation's advice) not set active checks on the hosts 
and instead I've added check_ping as a service on each of these hosts to 
do 5 pings as follows:

check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

For the host check I already use as you suggested a check_ping that only 
does one ping as follows:

check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

My understanding was that if the service check failed it would then 
abandon the service check altogether and move onto the host check which 
is only 1 ping.  The fact that the service checks are parallelised 
should mean that it shouldn't matter that there are 5 pings and the host 
check is only 1 ping which should resolve the bottleneck of serialised 
host checks.  I'm at a loss as to why performance has been impacted so 
severely.

Maybe I need to abandon the service checks altogether and just have a 
host check.  I'm reluctant to do this because I get very useful 
information from 5 pings, ie packet loss and high rta which is 
particularly handy for checking volatile links such as ADSL.  Maybe that 
is the trade-off, fast host checking with no useful stats or slow host 
checking with useful stats.

regards,
Aidan





-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Ton Voon

On 11 May 2007, at 20:25, Aidan Anderson wrote:

 First of all, thank-you for the replies!

 The majority of devices that I monitor are routers/vpn devices and I
 have (on the documentation's advice) not set active checks on the  
 hosts
 and instead I've added check_ping as a service on each of these  
 hosts to
 do 5 pings as follows:

 check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

 For the host check I already use as you suggested a check_ping that  
 only
 does one ping as follows:

 check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

 My understanding was that if the service check failed it would then
 abandon the service check altogether and move onto the host check  
 which
 is only 1 ping.  The fact that the service checks are parallelised
 should mean that it shouldn't matter that there are 5 pings and the  
 host
 check is only 1 ping which should resolve the bottleneck of serialised
 host checks.  I'm at a loss as to why performance has been impacted so
 severely.

 Maybe I need to abandon the service checks altogether and just have a
 host check.  I'm reluctant to do this because I get very useful
 information from 5 pings, ie packet loss and high rta which is
 particularly handy for checking volatile links such as ADSL.  Maybe  
 that
 is the trade-off, fast host checking with no useful stats or slow host
 checking with useful stats.

Just noticed this in your original email:

Host Check Execution Time:   0.03   / 10.04   / 0.843 sec

This means that some of your host checks are taking 10 seconds, which  
is, funnily enough, the timeout period for check_ping. So the -p 1  
will still take 10 seconds if the routers are not responding.

You can use a timeout flag for check_ping (but is only supported on  
some OSes). I guess check_icmp is a better bet here.

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson
Ton Voon wrote:
 On 11 May 2007, at 20:25, Aidan Anderson wrote:

   
 First of all, thank-you for the replies!

 The majority of devices that I monitor are routers/vpn devices and I
 have (on the documentation's advice) not set active checks on the  
 hosts
 and instead I've added check_ping as a service on each of these  
 hosts to
 do 5 pings as follows:

 check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

 For the host check I already use as you suggested a check_ping that  
 only
 does one ping as follows:

 check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

 My understanding was that if the service check failed it would then
 abandon the service check altogether and move onto the host check  
 which
 is only 1 ping.  The fact that the service checks are parallelised
 should mean that it shouldn't matter that there are 5 pings and the  
 host
 check is only 1 ping which should resolve the bottleneck of serialised
 host checks.  I'm at a loss as to why performance has been impacted so
 severely.

 Maybe I need to abandon the service checks altogether and just have a
 host check.  I'm reluctant to do this because I get very useful
 information from 5 pings, ie packet loss and high rta which is
 particularly handy for checking volatile links such as ADSL.  Maybe  
 that
 is the trade-off, fast host checking with no useful stats or slow host
 checking with useful stats.
 

 Just noticed this in your original email:

 Host Check Execution Time:   0.03   / 10.04   / 0.843 sec

 This means that some of your host checks are taking 10 seconds, which  
 is, funnily enough, the timeout period for check_ping. So the -p 1  
 will still take 10 seconds if the routers are not responding.

 You can use a timeout flag for check_ping (but is only supported on  
 some OSes). I guess check_icmp is a better bet here.

 Ton
   
Hi Ton,

Well spotted, thank-you.  check_icmp here we come :)

thanks
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Active Host Checks on Nagios 2.5

2007-05-11 Thread Andrew Tjang
Hello all,

I have a problem where Nagios doesn't execute the active host checks at
all. My system is set up in the following way:

I have about 1000 machines w/ their services set to disable active
check, and get all state info from an external command file (passive
checks).

To this I added 2 hosts w/ services in which I've enabled active checks,
and provided a check command (config below). What happens is the
services get actively checked and updated, but Nagios assumes the host
is up (I guess because the services are returning values), and never
executes the host check. This is troubling because the host check
provides me with important data.

In another instance, I have a host with active check enabled, and its
services' active checks disabled. Here, Nagios just keeps the host and
all its services in a pending stage until the services return with
information, again never executing the host check.

Any insight would be extremely helpful. Thanks!
-Andrew


Config of Host:

Template:
define host{
namegeneric-host-active; The
name of this host template
notifications_enabled   1   ; Host notifications are
enabled
event_handler_enabled   1   ; Host event handler is
enabled
flap_detection_enabled  1   ; Flap detection is
enabled
failure_prediction_enabled  1   ; Failure prediction is
enabled
process_perf_data   1   ; Process performance
data
retain_status_information   1   ; Retain status
information across program restarts
retain_nonstatus_information1   ; Retain non-status
information across program restarts
active_checks_enabled   1   ;
passive_checks_enabled  1   ;
register0   ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
Host:
define host{
use generic-host-active; Name of host template
to use   
host_name   NagiosServer
alias   NagiosServer
hostgroups  Nagios
checks_enabled 1
check_command check_nagios
address 127.0.0.1
max_check_attempts  10
check_period24x7
notification_interval   120
notification_period 24x7
notification_optionsd,r
contact_groups admins
}

Command:
define command{
command_namecheck_nagios
  command_line$USER1$/check_nagios.pl  
}


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Active Host Checks on Nagios 2.5

2007-05-11 Thread Patrick Morris
On Fri, 2007-05-11 at 18:31 -0400, Andrew Tjang wrote:
 Hello all,
 
 I have a problem where Nagios doesn't execute the active host checks at
 all. My system is set up in the following way:
 
 I have about 1000 machines w/ their services set to disable active
 check, and get all state info from an external command file (passive
 checks).
 
 To this I added 2 hosts w/ services in which I've enabled active checks,
 and provided a check command (config below). What happens is the
 services get actively checked and updated, but Nagios assumes the host
 is up (I guess because the services are returning values), and never
 executes the host check. This is troubling because the host check
 provides me with important data.

This is normal nagios behavior.  If you want the data a host check would
provide even when there are no service state changes (which is the only
condition under which host checks will normally run), you should
probably consider running it also as a service check.


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null