Re: [Nagios-users] check_disk plugin

2010-05-06 Thread Aidan Anderson
Davide Blasi wrote:
 with or without quotes give me the same result :(

   

Try using single quotes, e.g.

-I '/my/fist/.*' -I '/second/.*'



--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] About host check retry interval for nagios v 3.x

2010-04-27 Thread Aidan Anderson
Yu Watanabe wrote:
 Hello all.

 I would like to ask a question regarding to Host Definition in Nagios 
 official document of 3.x.

 In the Object Definitions - Host Definition, the host retry interval is 
 set as #.
 What would be the interval lentgh that Nagios is actually performed with this 
 value?
 Would it be the default time unit , 60 sec?

 Thank you
 Yu Watanabe

   
This is not a real value,  the # indicates that the directive requires a 
number.  In the case of retry_interval, this is the number of minutes 
between each check attempt after the host goes into a SOFT non-ok 
state.  Normally you put a 1 here so that it retries every 1 minute 
until it reaches max_check_attempts.

regards,
Aidan


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Host Dependency Object Inheritance Issue

2010-04-22 Thread Aidan Anderson
Aidan Anderson wrote:
 Hi,

 Using Nagios v3.2.1

 I am have problems defining host dependency object inheritance 
 (chaining) using templates.  It appears that if you use 2 levels of 
 inheritance, Nagios doesn't like it and aborts with the following error:

 Error: Could not expand dependent hostgroups and/or hosts specified in 
 host dependency (config file 
 '/usr/local/nagios/etc/manual/templates-hosts.cfg', starting on line 123)

 Here is my config.


 I created the following host dependency templates in 
 '/usr/local/nagios/etc/manual/templates-hosts.cfg'.  This is where the 
 error is found so I've highlighted line 123:


 define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}

 define hostdependency{
use dc-ping-proxy
namecam-ping-proxy
host_name   rp1b
register0
}

 define hostdependency{ 
 --- Line 123
use dc-ping-proxy
nametcl-ping-proxy
host_name   rp1a
register0
}


 I then created the following 2 host dependency definitions which use 
 the bottom 2 templates:


 define hostdependency{
use cam-ping-proxy
dependent_host_name cam-int
}

 define hostdependency{
use tcl-ping-proxy
dependent_host_name tcl-int
}


 This should expand as follows:


 define hostdependency{
host_name   rp1b
dependent_host_name cam-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

 define hostdependency{
host_name   rp1a
dependent_host_name tcl-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

 but I get the error.


 I then changed the configs to remove 1 level of inheritance.  My 
 templates and definitions now look like this:

 Template:

 define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}


 Definitions:

 define hostdependency{
use dc-ping-proxy
host_name   rp1b
dependent_host_name cam-int
}

 define hostdependency{
use dc-ping-proxy
host_name   rp1a
dependent_host_name tcl-int
}

 This should expand to the same configuration as when there were 2 
 levels of inheritance.

 However, the second configuration works fine but the first one 
 doesn't.  Also, I have created a similar service dependency setup with 
 2 levels of inheritance and that works fine.

 Can someone cast their eye over the configs listed above to see if 
 there is anything obvious that I have done wrong with the inheritance?

 regards,
 Aidan

I've changed the why I work out the host_name of the host being depended 
upon to make it more dynamic so this is no longer an issue for me.

If someone could double check my syntax to make sure I have not made an 
error, I will post to nagios-dev as a possible bug.

cheers,
Aidan


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Host Dependency Object Inheritance Issue

2010-04-20 Thread Aidan Anderson
Hi,

Using Nagios v3.2.1

I am have problems defining host dependency object inheritance 
(chaining) using templates.  It appears that if you use 2 levels of 
inheritance, Nagios doesn't like it and aborts with the following error:

Error: Could not expand dependent hostgroups and/or hosts specified in 
host dependency (config file 
'/usr/local/nagios/etc/manual/templates-hosts.cfg', starting on line 123)

Here is my config.


I created the following host dependency templates in 
'/usr/local/nagios/etc/manual/templates-hosts.cfg'.  This is where the 
error is found so I've highlighted line 123:


define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}

define hostdependency{
use dc-ping-proxy
namecam-ping-proxy
host_name   rp1b
register0
}

define hostdependency{ 
--- Line 123
use dc-ping-proxy
nametcl-ping-proxy
host_name   rp1a
register0
}


I then created the following 2 host dependency definitions which use the 
bottom 2 templates:


define hostdependency{
use cam-ping-proxy
dependent_host_name cam-int
}

define hostdependency{
use tcl-ping-proxy
dependent_host_name tcl-int
}


This should expand as follows:


define hostdependency{
host_name   rp1b
dependent_host_name cam-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

define hostdependency{
host_name   rp1a
dependent_host_name tcl-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

but I get the error.


I then changed the configs to remove 1 level of inheritance.  My 
templates and definitions now look like this:

Template:

define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}


Definitions:

define hostdependency{
use dc-ping-proxy
host_name   rp1b
dependent_host_name cam-int
}

define hostdependency{
use dc-ping-proxy
host_name   rp1a
dependent_host_name tcl-int
}

This should expand to the same configuration as when there were 2 levels 
of inheritance.

However, the second configuration works fine but the first one doesn't.  
Also, I have created a similar service dependency setup with 2 levels of 
inheritance and that works fine.

Can someone cast their eye over the configs listed above to see if there 
is anything obvious that I have done wrong with the inheritance?

regards,
Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson
Hi,

When acknowledging a host or service problem, I've noticed that the 
Persistent Comment check box is not ticked by default in v3 whereas it 
was in v2.  Is there anyway of changing this behaviour so that it is 
ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
change this behaviour.  If there is no official way to change it, does 
anyone know of a hack to do this?

regards,
Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson
Assaf Flatto wrote:
 Aidan Anderson wrote:
   
 Hi,

 When acknowledging a host or service problem, I've noticed that the 
 Persistent Comment check box is not ticked by default in v3 whereas it 
 was in v2.  Is there anyway of changing this behaviour so that it is 
 ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
 change this behaviour.  If there is no official way to change it, does 
 anyone know of a hack to do this?

 regards,
 Aidan


   
 

 you will need to make changes to the cmd.c file and recompile the cgi AFAIK.


 Good luck .



   
Hi Assaf,

Thanks for the reply.  I must admit, I've never messed about with C 
source code before but I'll give it a try :)

I assume that if I make any changes, I would need to repeat the changes 
following any upgrades?

regards,
Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson
Assaf Flatto wrote:
 Aidan Anderson wrote:
   
 Assaf Flatto wrote:
   
 
 Aidan Anderson wrote:
   
 
   
 Hi,

 When acknowledging a host or service problem, I've noticed that the 
 Persistent Comment check box is not ticked by default in v3 whereas it 
 was in v2.  Is there anyway of changing this behaviour so that it is 
 ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
 change this behaviour.  If there is no official way to change it, does 
 anyone know of a hack to do this?

 regards,
 Aidan


   
 
   
 
 you will need to make changes to the cmd.c file and recompile the cgi AFAIK.


 Good luck .



   
 
   
 Hi Assaf,

 Thanks for the reply.  I must admit, I've never messed about with C 
 source code before but I'll give it a try :)

 I assume that if I make any changes, I would need to repeat the changes 
 following any upgrades?
   
 

 Aidan

 If you've never delved in the C code , then i'd advise not to do any 
 changes with out the help of a C programer and have a backup before any 
 attempts begin .

 As for the upgrade issue - Of course !
 Since  this is a local change , unless you plan to to the upgrade for 
 the core with out the CGI's , any local change will be overwritten 
 when you upgrade .

 but once you do it and get it right , doing it again on the new version 
 will be much easier .

 Good luck

 Assaf


   
Hi Assaf,

I had to have a go and (surprising myself) have managed to do it.  I 
will remember to do this again each time I upgrade.

Below is the output of a 'diff' following my changes to cmd.c in case 
anyone else is interested in making this modification.  2 changes are 
required to cover host and service acknowledgements.


958c958
   printf(INPUT TYPE='checkbox' NAME='persistent' 
%s,(cmd==CMD_ACKNOWLEDGE_HOST_PROBLEM)?:CHECKED);
---
printf(INPUT TYPE='checkbox' NAME='persistent' 
CHECKED);
984c984
   printf(INPUT TYPE='checkbox' NAME='persistent' 
%s,(cmd==CMD_ACKNOWLEDGE_SVC_PROBLEM)?:CHECKED);
---
printf(INPUT TYPE='checkbox' NAME='persistent' 
CHECKED);


Thanks again for your help Assaf.

regards,
Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Child host becomes UNREACHABLE when parent changes from UP to a SOFT DOWN state

2010-04-07 Thread Aidan Anderson
Hi List!

I am in the process of upgrading from v2.12 to v3.2.1.  As well as 
upgrading, I am taking the opportunity to move to a new server at the 
same time.  This has allowed me to run both versions in tandem to 
compare the operation of the two versions.

One difference I noticed straight away was downtime duration on certain 
hosts.  For example, v2 would show a host down for over 2 days yet v3 
would show the same host as being down for only a few hours.  On 
investigation, it turned out that the parent of the host on v3 went into 
a soft down state.  This changed the host in question to an unreachable 
state.  The parent host recovered within a minute or so and changed the 
host back to a down state, effectively resetting the down duration back 
to zero.  I would have expected that the child host should only change 
state if the parent goes into a hard down state, not a soft down state.

I googled for the issue and found one related post from just over a year 
ago:

http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg25543.html

The poster was given various suggestions to circumvent the problem, i.e. 
tweaking flap detection, increasing time-out on the plugin etc but 
nothing that seemed to resolve his issue.

The posters main problem with this behaviour was that he was getting 
down e-mail alerts for hosts that are already down due to the state 
changes.  My issue is not with repeated alerts but with the accuracy of 
the down duration of the host.  When our support department look to 
resolve host problems, they will try and resolve the oldest problems 
first for obvious reasons of fairness to our customers.  This scenario 
breaks this.  In v3, to get an accurate downtime for a host, you would 
now have to trawl through the alert history or run a trend report for 
the host to find out when the host really went down.

Version 2 does not exhibit this problem.  I don't think this is by 
design but purely down to the way serial host checks work in v2.  When a 
host goes into a soft down state in v2, Nagios cannot do anything else 
until it has completed all the retries or the host recovers so Nagios 
never gets the chance to mark the child host unreachable unless it 
reaches max_check_attempts and determines that the parent host really is 
down.

The original poster of this problem made a good point that Nagios has 
all the tolerance built in to avoid false alarms on host checks but 
unfortunately this logic doesn't carry on through child hosts.

I can't see that the current way v3 deals with parent/child problems as 
being desirable for most people, although it seems to have only bothered 
2 of us!

Thoughts?

regards,
Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Accessing Nagios for the first time

2010-03-11 Thread Aidan Anderson
Tim Tompson wrote:
 My nagios.conf:

 ## BEGIN APACHE CONFIG SNIPPET - NAGIOS.CONF

 ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin

 Directory /usr/local/nagios/sbin


Options ExecCGI

AllowOverride None

Order allow,deny

Allow from all

AuthType Digest

AuthName Nagios Access

AuthUserFile /usr/local/nagios/etc/.digest_pw


Require valid-user

 /Directory



 Alias /nagios /usr/local/nagios/share

 Directory /usr/local/nagios/share

Options None

AllowOverride None


Order allow,deny

Allow from all

AuthType Digest

AuthName Nagios Access

AuthUserFile /usr/local/nagios/etc/.digest_pw

Require valid-user

 /Directory


 ## END APACHE CONFIG SNIPPETS
   
 I followed the instructions at: 
 http://nagios.sourceforge.net/docs/3_0/cgisecurity.html -- to secure 
 my install, and thats where I got the above .conf file.

 Its set to Allow from all, shouldn't that work?


It should and so should connecting to serveripaddress/nagios.  It looks 
like Apache is your issue. Is it running?  Are there other web sites 
running on the same box?  Are they working?

Aidan


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] problem creating hostgroup

2010-03-11 Thread Aidan Anderson
Gezina Dekker wrote:
 Hi all,
  
 When I restart after adding this host-group using split.cfg I get the 
 following.

 Running configuration check. CONFIG ERROR!  Restart aborted.  Check 
 your Nagios configuration

 I have server a definition for it. if I comment the lines out, the 
 resatrt is successful.
  
 I am just missing something???
  
 This is what my hostgroup definition looks like
  
 define hostgroup{
 hostgroup_name  Linux_group
 alias   No_Call-Out
 memberssvrlinux01
 }
  
 Any ideas that can help me out here?
  
 I have server a definition for it. if I comment the lines out, the 
 resatrt is successful.
  
 Regards and thanks for all the help so far, learned a lot,
  
 Gezina

Looks like a typo, did you mean to add the member as svrlinux01 or 
srvlinux01?


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] host status in nagios/var/status.dat

2008-03-21 Thread Aidan Anderson
Colin McKinnon wrote:
 Hi all,

 Having looked at what was avilable (NLG, centreon...) I decided to
 write my own front end for Nagios. This proved to be quite
 straightforward (except for sorting out the locking semantics in PHP -
 but that's another story).

 The only problem I'm having is that while the status reported in
 status.dat for services matches the output from the probe
 (0=OK,1=warn,2=crit,3=unknown) for hosts it seems to record a status
 of 0 for OK but 1 for critical (down).

 Is this the way its supposed to work? Or am I missing something?

 (Nagios 2.10)

 TIA

 C.
   
AFAIK this is correct.  With services Nagios needs to know the actual 
state, e.g. Ok, warn, crit, unknown but with hosts all it needs to know 
is if the host is UP or DOWN hence 0 or 1.

regards,
Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NSClient issue (Unknown alerts)

2008-01-28 Thread Aidan Anderson
Ronaldo A. Bueno Filho wrote:
 Hi, guys and ladies :)
  
 Now, I'm experiencing a problem regarding NSClient++.
 I'm monitoring a Windows workstation on my LAN. I configured 
 NSClient++ following its documentation.
 Now, that workstation shows unknown alerts for CPU load, Memory usage 
 and Uptime with the message: NSClient - ERROR: PDH Collection thread 
 not running.
  
 Looking on google.com, I found that it happens when you are not using 
 English language on Windows. Also, I did not find any resolution for 
 that issue.
 I'm not sure if there is an issue related with the windows language.
  
 Does somebody know how to solve this issue?
  
It tells you how to resolve this issue in the installation section of 
the readme.html file that comes with the nsclient download.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] String errors

2007-11-29 Thread Aidan Anderson
Jerad Riggin wrote:
 Ok, so I'm monitoring about 100 websites with string checks via 
 check_http.  We are mirroring what our datacenter actually checks, so 
 we have notifications turned off so when a site goes down we aren't 
 being spammed by the datacenter and our nagios installation.

 The issue is that every once in awhile a string changes on the site so 
 it goes critical in our nagios.  We perhaps won't notice it for a day 
 which messes up our availability reports.  Is there a way to 
 retroactively mark the time that it was critical as scheduled downtime?
I'm not aware of any way to retrospectively schedule downtime but you 
could probably solve your problem by adjusting your checking procedure.  
Assuming you or a colleague has access to change the html on your 
websites, you could have a standard string of text that you add to all 
your websites so that Nagios is checking the same text on each site.  
Whenever a new site is added, just make sure that your standard text 
string is added and you will avoid this problem in the first place.

hth

Aidan


-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Service notifications for a down host?

2007-11-19 Thread Aidan Anderson
Doug Tabb wrote:

 I’m looking for a little behavior confirmation here, please. It’s my 
 understanding that a failed service check is one way a host check is 
 initiated. If Nagios determines the host is down, further service 
 problem notifications are suppressed. However, I still get one or more 
 notifications for the initial service problems. Wouldn’t Nagios 
 suppress those initial service checks until at least one host check 
 has been made?

 For illustration, I have a remote site with host parent/child 
 relationships configured. If the site goes down, I get about 2 dozen 
 service notifications from various child hosts before it realizes the 
 top parent host is down and suppresses notifications for that site. I 
 then receive the one host recovery along with the 2 dozen or so 
 service recovery messages. I had hoped to not receive any service 
 notifications in this scenario. Is this expected behavior?

 Thank you very much!

 Doug Tabb

You shouldn't be seeing this behavior. The only time your should see 
this is if your services enter a hard state before the hosts. How does 
your host retry attempts compare to your service retry attempts?

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Monitor packet loss with check_ping command

2007-10-22 Thread Aidan Anderson
Alex Dehaini wrote:
 But in this case - if there is a 20% packet loss out of 10 pings sent 
 to a host - will I be notified?

That all depends on what you set your max_check_attempts to.  If you 
want to be notified of any packet loss, set this to 1 (one).  Increase 
this value if you prefer more tolerance.






 On 10/22/07, *Giles Coochey*  [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 check_ping uses the ping command.

  

 Packet Loss is considered a reply not within the timeout, this can
 typically be around 3000ms

  

 So something like:

  

 ./check_ping -H $HOSTNAME$ -w 3000,20% -c 3000,50%

  

 Will do what you want.

  

 * From: * [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]] *On Behalf Of
 *Alex Dehaini
 *Sent:* 22 October 2007 11:29
 *To:* nagios-users@lists.sourceforge.net
 mailto:nagios-users@lists.sourceforge.net
 *Subject:* [Nagios-users] Monitor packet loss with check_ping command

  

 Hi Guys,

 Can someone give me an example on how I can monitor only packet
 loss but not latency

 -- 
 Alex Dehaini
 Developer
 Site - www.alexdehaini.com http://www.alexdehaini.com
 Email - [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]




 -- 
 Alex Dehaini
 Developer
 Site - www.alexdehaini.com http://www.alexdehaini.com
 Email - [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 

 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson
Terry wrote:
 I have a contact that I only want to receive one notification.  How
 can I set this up?

   
A good place to start looking would be here:

http://nagios.sourceforge.net/docs/2_0/notifications.html

;)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson
Aidan Anderson wrote:
 Terry wrote:
   
 I have a contact that I only want to receive one notification.  How
 can I set this up?

   
 
 A good place to start looking would be here:

 http://nagios.sourceforge.net/docs/2_0/notifications.html

 ;)

  without supporting info will risk being sent to /dev/null
   
Apologies, here is where you want to start:

http://nagios.sourceforge.net/docs/2_0/escalations.html

You would specify the contact you only want to receive one notification 
in the first escalation and all other contacts in the first and 
subsequent escalations.


Aidan


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson
Terry wrote:
 Thanks for the reply.   Let me be more specific:

 version: 2.9
 OS: centos 5

 I have regular contacts set up, me for example.  I want to get
 notified every 30 minutes indefinitely if a service is in a hard state
 of warning or critical.  However, I want another contact to only get
 notified one time when that hard state is achieved.That's it.
 From what I can tell, I can only achieve this through the
 notification_interval which is only set at the host/service level, not
 the contact level.  If this is true, I will need to create 2 services,
 each with a different notification_interval and of course apply the
 different contact groups to each service.  Am I correct or is there
 another way around this?

 Thanks!

 On 10/5/07, Aidan Anderson [EMAIL PROTECTED] wrote:
   
 Terry wrote:
 
 I have a contact that I only want to receive one notification.  How
 can I set this up?


   
Hi Terry,

I've just posted you another message before seeing this one.  You want 
to use host or service escalations to achieve this.  I've briefly 
explained in the previous post but if you need more help, just shout.

Aidan


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Newbie Notifications Problem

2007-09-24 Thread Aidan Anderson
Ray Wadkins wrote:
 Thanks for the reply.  I didn't include notify-host-by-email because it
 didn't seem relevant, but it's in commands.cfg (pasted below).  The host
 isn't failing, just the service.  When you say service notifications
 are suppressed what do you mean?  Is there a configuration I can't see
 that's suppressing service notifications?  

   
It's something Nagios does by default.  If a service check fails, it 
will check the host.  If the host check fails, it will send out a host 
notification but suppress the service notification.

By what you've said, I don't think that's your problem.  I've noticed 
that you have used a lot of templates (inheritance) in your configs.  
You could try simplifying it but just setting up a contact, a contact 
group, a host, a service and a time period but don't use templates.  If 
that basic test works the problem may lie with one of your templates.

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Newbie Notifications Problem

2007-09-23 Thread Aidan Anderson
Ray Wadkins wrote:

 define contact{

 contact_namerwadkins_e  ; 
 Short name of user

 use generic-contact ; 
 Inherit default values from generic-contact template (defined above)

 alias   Ray Wadkins ; Full 
 name of user

 email   X; * CHANGE THIS 
 TO YOUR EMAIL ADDRESS **

 host_notifications_enabled  1

 service_notifications_enabled   1

 host_notification_period24x7

 service_notification_period 24x7

 host_notification_options   d,u,r,f,s

 service_notification_optionsw,u,c,r,f,s

 host_notification_commands  notify-host-by-email

 service_notification_commands   notify-service-by-email

   

You've specified the command notify-host-by-email in your contact definition

 *From commands.cfg*

 * *

 define command{

 command_namenotify-service-by-email

 command_line/usr/bin/printf %b * Nagios 
 *\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: 
 $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $

 HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: 
 $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$ | /bin/mail -s 
 ** $NOTIFICATIONTYPE$ Service Alert: $HOS

 TALIAS$/$SERVICEDESC$ is $SERVICESTATE$ ** $CONTACTEMAIL$

 }

  

but don't seem to have defined it in the commands.cfg file.

When a host goes down, only host notifications are sent out (service 
notifications are suppressed).  As you don't seem to have defined a host 
notification command, you will never receive any notifications.  Try 
adding the following to commands.cfg:

 # 'notify-host-by-email' command definition
 define command{
 command_namenotify-host-by-email
 command_line/usr/bin/printf %b Notification Type: 
 $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nDetails: $HOSTALIAS$\nAddress: 
 $HOSTADDRESS$\nState: $HOSTSTATE$\nInfo: $HOSTOUTPUT$\n\nDate/Time: 
 $LONGDATETIME$\n\n$HOSTACKAUTHOR$\n$HOSTACKCOMMENT$\n | /bin/mail -s 
 Host $HOSTSTATE$ alert for $HOSTNAME$ - $HOSTALIAS$ $CONTACTEMAIL$
 }
HTH
Aidan



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Configure smtp in Nagios

2007-09-21 Thread Aidan Anderson
Rodrigo Tavares wrote:
 Hello,

 How I do configure smtp in Nagios ?

 best regards,

 Rodrigo Faria
   
You don't.  Whatever mail server you are running on the Nagios box will 
take care of SMTP.  Nagios simply pipes the notification through the 
/bin/mail command or whatever command suits the mail server.  Most 
distros come with Sendmail or Postfix by default, just make sure you 
have one running and configured to route mail.

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Log Management Tips?

2007-09-20 Thread Aidan Anderson
Rogelio Bastardo wrote:
 Anyone have any tips for dealing with Nagios logs?

 Things are getting a little crazy, and I haven't even been logging very much!

 e.g.

 [EMAIL PROTECTED] run]# find / *nagios*  -type f -size +100k
 -exec ls -lh {} \; | awk '{ print $9 :  $5 }'
 /var/log/nagios/archives/nagios-08-14-2007-00.log: 2.3G
 /var/log/nagios/archives/nagios-08-13-2007-00.log: 3.4G
 /var/log/nagios/archives/nagios-08-12-2007-00.log: 2.6G
 /var/log/messages.4: 3.5G
 [EMAIL PROTECTED] run]#

 -

   
Good grief, what on earth are you logging?  I'm monitoring over 1000 
hosts and 1600 services and my daily logs range between 600KB and 
1.5MB.  Can you post a snippet of your log (say a 15min span ) so we can 
get an idea of what it is logging?

I'd love to see how your browser copes with viewing the daily log.

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NRPE (No output returned from plugin)

2007-07-20 Thread Aidan Anderson
shacky wrote:
 Hi.

 I'm using NRPE to monitor a remote server.
 The most part of the plugins works without problems, but 
 check_backuppc returns the error (No output returned from plugin) in 
 the Nagios web interface.

 The check_backuppc stanza in the Nagios configuration is the following:

 define service{
 use remote-service
 host_name   myremoteserver
 service_description BackupPC
 check_command   check_nrpe!check_backuppc
 }

 If I execute from the shell check_nrpe -H bakserver.blupixel.local -c 
 check_backuppc I correctly get the plugin's answer (BACKUPPC WARNING 
 - (5/7) failures).

 Where is the problem?

Have you set up the command definition correctly in commands.cfg or 
wherever you store your commands on your Nagios server.  Also check that 
Nagios has permission to execute the pluggin on the remote machine.  
Test by re-trying your check_nrpe command logged on as nagios.

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Cancel Downtime?

2007-06-15 Thread Aidan Anderson

 On Jun 7, 2007, at 11:18 PM, Anthony Mendoza wrote:

   
 Click Downtime and then the Trash can icon to the right of the
 service/host you want to cancel.

 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
 Of Wil Schultz
 Sent: Thursday, June 07, 2007 11:11 PM
 To: nagios-users
 Subject: [Nagios-users] Cancel Downtime?

 IIRC, there used to be a Cancel Downtime link, am I blind or did
 this go away?

 How do you cancel scheduled downtime?

   

I need to cancel scheduled downtime on a host and took the advise of 
Anthony Mendoza in this thread.  Clicking on the Trash can icon 
certainly removes the Nagios generated comment but the period of 
scheduled downtime remains.  Any ideas anyone?

regards,
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Cancel Downtime?

2007-06-15 Thread Aidan Anderson
Aidan Anderson wrote:
 On Jun 7, 2007, at 11:18 PM, Anthony Mendoza wrote:

   
 
 Click Downtime and then the Trash can icon to the right of the
 service/host you want to cancel.

 
   
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf
 Of Wil Schultz
 Sent: Thursday, June 07, 2007 11:11 PM
 To: nagios-users
 Subject: [Nagios-users] Cancel Downtime?

 IIRC, there used to be a Cancel Downtime link, am I blind or did
 this go away?

 How do you cancel scheduled downtime?

   
 

 I need to cancel scheduled downtime on a host and took the advise of 
 Anthony Mendoza in this thread.  Clicking on the Trash can icon 
 certainly removes the Nagios generated comment but the period of 
 scheduled downtime remains.  Any ideas anyone?

 regards,
 Aidan


   
Ignore last e-mail, I found it.  You do it from the downtime link on the 
sidebar. :)

cheers,
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] how to unsubscribe????

2007-06-02 Thread Aidan Anderson
Go to https://lists.sourceforge.net/lists/listinfo/nagios-users

Go to the bottom of the page to the section headed Nagios-users 
Subscribers and follow the instructions for unsubscribing.  You'll need 
your password.

Aidan

Arief Iqbal wrote:
 hi, how can i unsubscribe from this goddamned mailing list??? thx

 
 Boardwalk for $500? In 2007? Ha!
 Play Monopoly Here and Now 
 http://us.rd.yahoo.com/evt=48223/*http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
  
 (it's updated for today's economy) at Yahoo! Games.
 

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson
Hi,

I have recently set up Nagios 2.8 and am monitoring 1623 hosts and 1946 
services.  Performance under normal circumstances is fine.  Typical 
check and latency times are as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.418 sec
Service Check Latency:   0.00   / 1.87/ 0.479 sec
Host Check Execution Time:   0.03   / 10.04   / 0.843 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

The vast majority of these hosts are spread over 320 geographic 
locations throughout the UK.  These locations are connected to our data 
centre via a hardware VPN device with the majority (about 270) using a 
private ADSL circuit to facilitate the VPN connection.

Yesterday, we had a major outage caused by the failure of one of the 
ADSL central routers at our ISP.  This took out a third of our ADSL 
sites (roughly 90) for 16 minutes.  Each of these sites has about 4 
devices monitored by Nagios so in effect about 360 devices (hosts) went 
down in an instant.

As you can imagine, we were aware of the problem almost immediately due 
to the barrage of phone calls from out clients, but unfortunately Nagios 
didn't even remotely reflect the current situation.  I have used parent 
child relationships to the full so I was expecting a good portion of the 
VPN devices to show as down with all other devices behind the VPN device 
showing as unreachable.  This was not the case.  It actually took half 
an hour to find only 20 of these VPN devices down and another half an 
hour to notice that they were actually back up again having only noticed 
20 of the 90 in the first place.  During the outage, the service check 
latency was increasing exponentially and the performance stats half an 
hour after the start of the problem were as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.646 sec
Service Check Latency:   947.84 / 2080.05 / 1467.274 sec
Host Check Execution Time:   0.03   / 10.04   / 0.968 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

As you can see, the average service check latency time has jumped to 
1467 seconds (24 mins).  On all of these hosts there is only one service 
which is a ping (check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5).  
The host check is also a ping (check_ping -H $HOSTADDRESS$ -w 3000.0,80% 
-c 5000.0,100% -p 1) but much faster with only 1 ping being sent out.  
The normal_check_interval on services is 5 mins with 2 
max_check_attempts and a retry_interval of 1.   The host also has a 
max_check_attempts of 2.

A lot of people have mentioned using fping to speed things up but if my 
average service latency is only 0.479 seconds in normal circumstances, I 
can't see how tweaking this will help in a major outage situation.

I have also read through the section on tweaking performance which seems 
to be geared toward protecting the machine Nagios is running on.  I want 
to do the opposite and give Nagios a lot more work to do.  The machine 
is dedicated to Nagios and is quite high spec.  It's an IBM xServies 336 
with 2 Dual Core processors and 4GB of RAM so it should be able to take 
a much bigger hit.  I have been monitoring CPU performance with MRTG and 
the CPU performance never goes lower than 90% idle.  Ironically during 
the problem, the machines idle time jumped to 95% when I would have 
expected to drop rather than increase.

The only performance tweak I could see that would affect the performance 
in this situation is max_concurrent_checks but this is already set to 0.

I am fairly new to Nagios (2 months) so I apologise if I have missed 
something obvious but any pointers to a solution to this problem would 
be greatly appreciated.  I have run a nagios -s (attached below) which 
seems to indicate that everything is setup ok.  Let me know if you 
require any more information from my config that would help diagnose the 
problem.

regards,
Aidan




Nagios 2.8
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 03-08-2007
License: GPL

Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---
Total hosts: 1624
Total scheduled hosts:   0
Host inter-check delay method:   SMART
Average host check interval: 0.00 sec
Host inter-check delay:  0.00 sec
Max host check spread:   30 min
First scheduled check:   N/A
Last scheduled check:N/A


SERVICE SCHEDULING INFORMATION
---
Total services: 1947
Total scheduled services:   1947
Service inter-check delay method:   SMART
Average 

Re: [Nagios-users] nrpe command line test question

2007-05-11 Thread Aidan Anderson
Maxwell,Brady wrote:

 My nrpe.cfg on the remote host contains these commands

 command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c 
 $ARG2$ -p $ARG3$

 command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 
 -p /dev/vga/root

 running a check_nrpe from the command line has the following results.

 [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
 check_disk –a 10 5 /dev/vga/root

 check_disk: Warning threshold must be integer or percentage!

 [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
 check_disk1

 DISK OK - free space: / 801 MB (12% inode=81%);| 
 /=5625MB;6405;6415;80;6425

 I would like to be able to pass arguments to the remote system, 
 allowing me to set threshold values at the service level.

 Can anyone tell me why I get the error “Warning threshold must be 
 integer or percentage!” ?

 Or suggest another method of passing the args to the remote nrpe process?

 Thanks

 Brady

 

 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 

 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
Make sure that you set dont_blame_nrpe to 1 in nrpe.cfg to allow nrpe to 
accept client arguments. This is set to 0 by default as it is deemed a 
security risk

Aidan



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson

Ton Voon wrote:
 On 11 May 2007, at 19:03, Jim Avery wrote:

   
 On 11/05/07, Aidan Anderson [EMAIL PROTECTED] wrote:

 
 A lot of people have mentioned using fping to speed things up but  
 if my
 average service latency is only 0.479 seconds in normal  
 circumstances, I
 can't see how tweaking this will help in a major outage situation.
   
 check_ping won't finish until it's done all the pings, and the pings
 are (if I recall) always at one second intervals.  This means that if
 you've configured check_ping to do (let's say) 5 pings, the check_ping
 plugin will always take at least 5 seconds to complete.

 If the check_ping is being run as a host check rather than a service
 check, my understanding is that this is the only thing Nagios will be
 doing; it doesn't do anything else concurrently (correct me if I'm
 wrong people).
 

 Correct. We noticed this some time ago too: http://altinity.blogs.com/ 
 dotorg/2006/05/immediate_perfo.html

 If you do stick to using check_ping, use -p 1 which is sub second  
 response time.

   
First of all, thank-you for the replies!

The majority of devices that I monitor are routers/vpn devices and I 
have (on the documentation's advice) not set active checks on the hosts 
and instead I've added check_ping as a service on each of these hosts to 
do 5 pings as follows:

check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

For the host check I already use as you suggested a check_ping that only 
does one ping as follows:

check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

My understanding was that if the service check failed it would then 
abandon the service check altogether and move onto the host check which 
is only 1 ping.  The fact that the service checks are parallelised 
should mean that it shouldn't matter that there are 5 pings and the host 
check is only 1 ping which should resolve the bottleneck of serialised 
host checks.  I'm at a loss as to why performance has been impacted so 
severely.

Maybe I need to abandon the service checks altogether and just have a 
host check.  I'm reluctant to do this because I get very useful 
information from 5 pings, ie packet loss and high rta which is 
particularly handy for checking volatile links such as ADSL.  Maybe that 
is the trade-off, fast host checking with no useful stats or slow host 
checking with useful stats.

regards,
Aidan





-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson
Ton Voon wrote:
 On 11 May 2007, at 20:25, Aidan Anderson wrote:

   
 First of all, thank-you for the replies!

 The majority of devices that I monitor are routers/vpn devices and I
 have (on the documentation's advice) not set active checks on the  
 hosts
 and instead I've added check_ping as a service on each of these  
 hosts to
 do 5 pings as follows:

 check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

 For the host check I already use as you suggested a check_ping that  
 only
 does one ping as follows:

 check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

 My understanding was that if the service check failed it would then
 abandon the service check altogether and move onto the host check  
 which
 is only 1 ping.  The fact that the service checks are parallelised
 should mean that it shouldn't matter that there are 5 pings and the  
 host
 check is only 1 ping which should resolve the bottleneck of serialised
 host checks.  I'm at a loss as to why performance has been impacted so
 severely.

 Maybe I need to abandon the service checks altogether and just have a
 host check.  I'm reluctant to do this because I get very useful
 information from 5 pings, ie packet loss and high rta which is
 particularly handy for checking volatile links such as ADSL.  Maybe  
 that
 is the trade-off, fast host checking with no useful stats or slow host
 checking with useful stats.
 

 Just noticed this in your original email:

 Host Check Execution Time:   0.03   / 10.04   / 0.843 sec

 This means that some of your host checks are taking 10 seconds, which  
 is, funnily enough, the timeout period for check_ping. So the -p 1  
 will still take 10 seconds if the routers are not responding.

 You can use a timeout flag for check_ping (but is only supported on  
 some OSes). I guess check_icmp is a better bet here.

 Ton
   
Hi Ton,

Well spotted, thank-you.  check_icmp here we come :)

thanks
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Disable service_notification_commands

2007-05-02 Thread Aidan Anderson
Hi Kareem,

I am using 2.8 and the docs have 'service_notification_commands' in red 
(required) so I don't know whether that is an error in the docs or not.

If you want to disable service notifications, put the directive back and 
simply specify the 'n' option.  This will disable service notifications.

regards,
Aidan


Kareem Mahgoub wrote:
 Dear All
 I am using Nagios 2.5 and I want to disable the service notification
 command.
 On the documentation under the  section of Contact Definition, I can see
 that the directive service_notification_commands is in black which means
 it is optional. When I commented it and made a conf check it gave Error:
 Contact 'kareem' has no service notification commands defined!
 Am I missing something here?
 Best Regards,
 Kareem Mahgoub


 -
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null
   

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Service Group Summary Changing Numbers

2007-04-10 Thread Aidan Anderson




Hi Elijah,

Fantastic, glad I could help.

The situation you mentioned where not all processes stop after a
restart is quite common and has been mentioned a few times on the
list.  I had similar problems and one post suggested doing a "reload"
rather than a "restart".  I now religiously use "reload" and have not
had a problem since.

regards,
Aidan


Elijah Savage wrote:

  Aidan,

Not sure how I miss that but you are right there were multiple processes running. I think my situation was from actually doing a restart on the services with the init script and they all did not stop for some reason. I have since stopped all services killed off any additional processes and now things seem to be back to exactly what I have grown to expect, a nice stable platform in nagios.

Thank you


- Original Message -
From: Aidan Anderson [EMAIL PROTECTED]
To: Nagios Users Mailinglist nagios-users@lists.sourceforge.net
Sent: Tuesday, April 10, 2007 6:27:21 AM GMT-0500 Auto-Detected
Subject: Re: [Nagios-users] Service Group Summary Changing Numbers

Hi Elijah,

This sounds similar to a problem that I had, refreshing the browser was 
giving me different results.  It turned out that the problem was to do 
with 2 Nagios processes running.  When I was refreshing the browser, it 
was randomly picking one of the processes and reporting back the state 
of that particular instance hence the different results on each 
refresh.  To rectify, I stopped Nagios and manually removed the 
remaining process and then started Nagios again.  I caused the problem 
during a Nagios upgrade, I didn't stop Nagios before starting the 
upgrade so it ended up being started twice.

Regards,
Aidan



Elijah Savage wrote:
  
  
All,

I have something going on that I consider very weird happening. Under 
service group summary my numbers are changing on refresh of the 
browser when there are no devices down. I have 4 different host groups 
on that page, but in one group I have 70 devices. You login it shows 
70 devices up then you do a refresh and it will show 60 devices up 
none down when you know you have 70, next refresh it may show 
68devices up none down. I know it all sounds like baby talk but it is 
some what difficult for me to explain. It does this under the 
hostgroup summary as well.

I have been on this list for a long time and have never had to post 
because through reading the emails and searching the archives I have 
been able to achieve what I needed to for my environment, but I could 
not find anything close to what I am seeing now.

Nagios is Version 2.7 updated this past weekend had I known and was 
paying attention I would have waited on the 2.8 release from this 
weekend :)
Running on Solaris and Sun V880 Platform 4cpu's 8gig of mem.

The server is no where close to being over loaded. Thing is I do not 
know if this was happening on the previous version. Of course when you 
announce a major change or upgrade people really start to pay close 
attention to the tools they use.

Oh yeah one last thing these devices being monitored are Cisco devices 
with the check_command   check-router-alive.

Any help would be greatly appreciated.



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

  
  
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
  




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://ww