from:"Aidan Anderson"

Re: [Nagios-users] check_disk plugin

2010-05-06 Thread Aidan Anderson

Davide Blasi wrote:
> with or without quotes give me the same result :(
>
>   

Try using single quotes, e.g.

-I '/my/fist/.*' -I '/second/.*'



--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] About host check retry interval for nagios v 3.x

2010-04-27 Thread Aidan Anderson

Yu Watanabe wrote:
> Hello all.
>
> I would like to ask a question regarding to "Host Definition" in Nagios 
> official document of 3.x.
>
> In the "Object Definitions -> Host Definition", the host retry interval is 
> set as "#".
> What would be the interval lentgh that Nagios is actually performed with this 
> value?
> Would it be the default time unit , 60 sec?
>
> Thank you
> Yu Watanabe
>
>   
This is not a real value,  the # indicates that the directive requires a 
number.  In the case of retry_interval, this is the number of minutes 
between each check attempt after the host goes into a SOFT non-ok 
state.  Normally you put a 1 here so that it retries every 1 minute 
until it reaches max_check_attempts.

regards,
Aidan


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Host Dependency Object Inheritance Issue

2010-04-22 Thread Aidan Anderson

Aidan Anderson wrote:
> Hi,
>
> Using Nagios v3.2.1
>
> I am have problems defining host dependency object inheritance 
> (chaining) using templates.  It "appears" that if you use 2 levels of 
> inheritance, Nagios doesn't like it and aborts with the following error:
>
> Error: Could not expand dependent hostgroups and/or hosts specified in 
> host dependency (config file 
> '/usr/local/nagios/etc/manual/templates-hosts.cfg', starting on line 123)
>
> Here is my config.
>
>
> I created the following host dependency templates in 
> '/usr/local/nagios/etc/manual/templates-hosts.cfg'.  This is where the 
> error is found so I've highlighted line 123:
>
>
> define hostdependency{
>namedc-ping-proxy
>execution_failure_criteria  d,u,p
>notification_failure_criteria   d,u,p
>register0
>}
>
> define hostdependency{
>use dc-ping-proxy
>namecam-ping-proxy
>host_name   rp1b
>register0
>}
>
> define hostdependency{ 
> <--- Line 123
>use dc-ping-proxy
>nametcl-ping-proxy
>host_name   rp1a
>register0
>}
>
>
> I then created the following 2 host dependency definitions which use 
> the bottom 2 templates:
>
>
> define hostdependency{
>use cam-ping-proxy
>dependent_host_name cam-int
>}
>
> define hostdependency{
>use tcl-ping-proxy
>dependent_host_name tcl-int
>}
>
>
> This should expand as follows:
>
>
> define hostdependency{
>host_name   rp1b
>dependent_host_name cam-int
>execution_failure_criteria  d,u,p
>notification_failure_criteria   d,u,p
>}
>
> define hostdependency{
>host_name   rp1a
>dependent_host_name tcl-int
>execution_failure_criteria  d,u,p
>notification_failure_criteria   d,u,p
>}
>
> but I get the error.
>
>
> I then changed the configs to remove 1 level of inheritance.  My 
> templates and definitions now look like this:
>
> Template:
>
> define hostdependency{
>namedc-ping-proxy
>execution_failure_criteria  d,u,p
>notification_failure_criteria   d,u,p
>register0
>}
>
>
> Definitions:
>
> define hostdependency{
>use dc-ping-proxy
>host_name   rp1b
>dependent_host_name cam-int
>}
>
> define hostdependency{
>use dc-ping-proxy
>host_name   rp1a
>dependent_host_name tcl-int
>}
>
> This should expand to the same configuration as when there were 2 
> levels of inheritance.
>
> However, the second configuration works fine but the first one 
> doesn't.  Also, I have created a similar service dependency setup with 
> 2 levels of inheritance and that works fine.
>
> Can someone cast their eye over the configs listed above to see if 
> there is anything obvious that I have done wrong with the inheritance?
>
> regards,
> Aidan
>
I've changed the why I work out the host_name of the host being depended 
upon to make it more dynamic so this is no longer an issue for me.

If someone could double check my syntax to make sure I have not made an 
error, I will post to nagios-dev as a possible bug.

cheers,
Aidan


--
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Host Dependency Object Inheritance Issue

2010-04-20 Thread Aidan Anderson

Hi,

Using Nagios v3.2.1

I am have problems defining host dependency object inheritance 
(chaining) using templates.  It "appears" that if you use 2 levels of 
inheritance, Nagios doesn't like it and aborts with the following error:

Error: Could not expand dependent hostgroups and/or hosts specified in 
host dependency (config file 
'/usr/local/nagios/etc/manual/templates-hosts.cfg', starting on line 123)

Here is my config.


I created the following host dependency templates in 
'/usr/local/nagios/etc/manual/templates-hosts.cfg'.  This is where the 
error is found so I've highlighted line 123:


define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}

define hostdependency{
use dc-ping-proxy
namecam-ping-proxy
host_name   rp1b
register0
}

define hostdependency{ 
<--- Line 123
use dc-ping-proxy
nametcl-ping-proxy
host_name   rp1a
register0
}


I then created the following 2 host dependency definitions which use the 
bottom 2 templates:


define hostdependency{
use cam-ping-proxy
dependent_host_name cam-int
}

define hostdependency{
use tcl-ping-proxy
dependent_host_name tcl-int
}


This should expand as follows:


define hostdependency{
host_name   rp1b
dependent_host_name cam-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

define hostdependency{
host_name   rp1a
dependent_host_name tcl-int
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
}

but I get the error.


I then changed the configs to remove 1 level of inheritance.  My 
templates and definitions now look like this:

Template:

define hostdependency{
namedc-ping-proxy
execution_failure_criteria  d,u,p
notification_failure_criteria   d,u,p
register0
}


Definitions:

define hostdependency{
use dc-ping-proxy
host_name   rp1b
dependent_host_name cam-int
}

define hostdependency{
use dc-ping-proxy
host_name   rp1a
dependent_host_name tcl-int
}

This should expand to the same configuration as when there were 2 levels 
of inheritance.

However, the second configuration works fine but the first one doesn't.  
Also, I have created a similar service dependency setup with 2 levels of 
inheritance and that works fine.

Can someone cast their eye over the configs listed above to see if there 
is anything obvious that I have done wrong with the inheritance?

regards,
Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson

Assaf Flatto wrote:
> Aidan Anderson wrote:
>   
>> Assaf Flatto wrote:
>>   
>> 
>>> Aidan Anderson wrote:
>>>   
>>> 
>>>   
>>>> Hi,
>>>>
>>>> When acknowledging a host or service problem, I've noticed that the 
>>>> "Persistent Comment" check box is not ticked by default in v3 whereas it 
>>>> was in v2.  Is there anyway of changing this behaviour so that it is 
>>>> ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
>>>> change this behaviour.  If there is no official way to change it, does 
>>>> anyone know of a hack to do this?
>>>>
>>>> regards,
>>>> Aidan
>>>>
>>>>
>>>>   
>>>> 
>>>>   
>>>> 
>>> you will need to make changes to the cmd.c file and recompile the cgi AFAIK.
>>>
>>>
>>> Good luck .
>>>
>>>
>>>
>>>   
>>> 
>>>   
>> Hi Assaf,
>>
>> Thanks for the reply.  I must admit, I've never messed about with C 
>> source code before but I'll give it a try :)
>>
>> I assume that if I make any changes, I would need to repeat the changes 
>> following any upgrades?
>>   
>> 
>
> Aidan
>
> If you've never delved in the C code , then i'd advise not to do any 
> changes with out the help of a C programer and have a backup before any 
> attempts begin .
>
> As for the upgrade issue - Of course !
> Since  this is a local change , unless you plan to to the upgrade for 
> the core with out the CGI's , any "local" change will be overwritten 
> when you upgrade .
>
> but once you do it and get it right , doing it again on the new version 
> will be much easier .
>
> Good luck
>
> Assaf
>
>
>   
Hi Assaf,

I had to have a go and (surprising myself) have managed to do it.  I 
will remember to do this again each time I upgrade.

Below is the output of a 'diff' following my changes to cmd.c in case 
anyone else is interested in making this modification.  2 changes are 
required to cover host and service acknowledgements.


958c958
<   printf("",(cmd==CMD_ACKNOWLEDGE_HOST_PROBLEM)?"":"CHECKED");
---
 >   printf("");
984c984
<   printf("   printf("");


Thanks again for your help Assaf.

regards,
Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson

Assaf Flatto wrote:
> Aidan Anderson wrote:
>   
>> Hi,
>>
>> When acknowledging a host or service problem, I've noticed that the 
>> "Persistent Comment" check box is not ticked by default in v3 whereas it 
>> was in v2.  Is there anyway of changing this behaviour so that it is 
>> ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
>> change this behaviour.  If there is no official way to change it, does 
>> anyone know of a hack to do this?
>>
>> regards,
>> Aidan
>>
>>
>>   
>> 
>
> you will need to make changes to the cmd.c file and recompile the cgi AFAIK.
>
>
> Good luck .
>
>
>
>   
Hi Assaf,

Thanks for the reply.  I must admit, I've never messed about with C 
source code before but I'll give it a try :)

I assume that if I make any changes, I would need to repeat the changes 
following any upgrades?

regards,
Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Persistent Comment in Acknowledgement

2010-04-12 Thread Aidan Anderson

Hi,

When acknowledging a host or service problem, I've noticed that the 
"Persistent Comment" check box is not ticked by default in v3 whereas it 
was in v2.  Is there anyway of changing this behaviour so that it is 
ticked by default?  I can't find any options in cgi.cfg or nagios.cfg to 
change this behaviour.  If there is no official way to change it, does 
anyone know of a hack to do this?

regards,
Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Child host becomes UNREACHABLE when parent changes from UP to a SOFT DOWN state

2010-04-07 Thread Aidan Anderson

Hi List!

I am in the process of upgrading from v2.12 to v3.2.1.  As well as 
upgrading, I am taking the opportunity to move to a new server at the 
same time.  This has allowed me to run both versions in tandem to 
compare the operation of the two versions.

One difference I noticed straight away was downtime duration on certain 
hosts.  For example, v2 would show a host down for over 2 days yet v3 
would show the same host as being down for only a few hours.  On 
investigation, it turned out that the parent of the host on v3 went into 
a soft down state.  This changed the host in question to an unreachable 
state.  The parent host recovered within a minute or so and changed the 
host back to a down state, effectively resetting the down duration back 
to zero.  I would have expected that the child host should only change 
state if the parent goes into a hard down state, not a soft down state.

I googled for the issue and found one related post from just over a year 
ago:

http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg25543.html

The poster was given various suggestions to circumvent the problem, i.e. 
tweaking flap detection, increasing time-out on the plugin etc but 
nothing that seemed to resolve his issue.

The posters main problem with this behaviour was that he was getting 
down e-mail alerts for hosts that are already down due to the state 
changes.  My issue is not with repeated alerts but with the accuracy of 
the down duration of the host.  When our support department look to 
resolve host problems, they will try and resolve the oldest problems 
first for obvious reasons of fairness to our customers.  This scenario 
breaks this.  In v3, to get an accurate downtime for a host, you would 
now have to trawl through the alert history or run a trend report for 
the host to find out when the host really went down.

Version 2 does not exhibit this problem.  I don't think this is by 
design but purely down to the way serial host checks work in v2.  When a 
host goes into a soft down state in v2, Nagios cannot do anything else 
until it has completed all the retries or the host recovers so Nagios 
never gets the chance to mark the child host unreachable unless it 
reaches max_check_attempts and determines that the parent host really is 
down.

The original poster of this problem made a good point that Nagios has 
all the tolerance built in to avoid false alarms on host checks but 
unfortunately this logic doesn't carry on through child hosts.

I can't see that the current way v3 deals with parent/child problems as 
being desirable for most people, although it seems to have only bothered 
2 of us!

Thoughts?

regards,
Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] problem creating hostgroup

2010-03-11 Thread Aidan Anderson

Gezina Dekker wrote:
> Hi all,
>  
> When I restart after adding this host-group using split.cfg I get the 
> following.
>
> Running configuration check. CONFIG ERROR!  Restart aborted.  Check 
> your Nagios configuration
>
> I have server a definition for it. if I comment the lines out, the 
> resatrt is successful.
>  
> I am just missing something???
>  
> This is what my hostgroup definition looks like
>  
> define hostgroup{
> hostgroup_name  Linux_group
> alias   No_Call-Out
> memberssvrlinux01
> }
>  
> Any ideas that can help me out here?
>  
> I have server a definition for it. if I comment the lines out, the 
> resatrt is successful.
>  
> Regards and thanks for all the help so far, learned a lot,
>  
> Gezina

Looks like a typo, did you mean to add the member as svrlinux01 or 
srvlinux01?


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Accessing Nagios for the first time

2010-03-11 Thread Aidan Anderson

Tim Tompson wrote:
> My nagios.conf:
>
> ## BEGIN APACHE CONFIG SNIPPET - NAGIOS.CONF
>
> ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
>
> 
>
>
>Options ExecCGI
>
>AllowOverride None
>
>Order allow,deny
>
>Allow from all
>
>AuthType Digest
>
>AuthName "Nagios Access"
>
>AuthUserFile /usr/local/nagios/etc/.digest_pw
>
>
>Require valid-user
>
> 
>
>
>
> Alias /nagios "/usr/local/nagios/share"
>
> 
>
>Options None
>
>AllowOverride None
>
>
>Order allow,deny
>
>Allow from all
>
>AuthType Digest
>
>AuthName "Nagios Access"
>
>AuthUserFile /usr/local/nagios/etc/.digest_pw
>
>Require valid-user
>
> 
>
>
> ## END APACHE CONFIG SNIPPETS
>   
> I followed the instructions at: 
> http://nagios.sourceforge.net/docs/3_0/cgisecurity.html -- to secure 
> my install, and thats where I got the above .conf file.
>
> Its set to "Allow from all", shouldn't that work?
>

It should and so should connecting to serveripaddress/nagios.  It looks 
like Apache is your issue. Is it running?  Are there other web sites 
running on the same box?  Are they working?

Aidan


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] host status in nagios/var/status.dat

2008-03-21 Thread Aidan Anderson

Colin McKinnon wrote:
> Hi all,
>
> Having looked at what was avilable (NLG, centreon...) I decided to
> write my own front end for Nagios. This proved to be quite
> straightforward (except for sorting out the locking semantics in PHP -
> but that's another story).
>
> The only problem I'm having is that while the status reported in
> status.dat for services matches the output from the probe
> (0=OK,1=warn,2=crit,3=unknown) for hosts it seems to record a status
> of 0 for OK but 1 for critical (down).
>
> Is this the way its supposed to work? Or am I missing something?
>
> (Nagios 2.10)
>
> TIA
>
> C.
>   
AFAIK this is correct.  With services Nagios needs to know the actual 
state, e.g. Ok, warn, crit, unknown but with hosts all it needs to know 
is if the host is UP or DOWN hence 0 or 1.

regards,
Aidan

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NSClient issue (Unknown alerts)

2008-01-28 Thread Aidan Anderson

Ronaldo A. Bueno Filho wrote:
> Hi, guys and ladies :)
>  
> Now, I'm experiencing a problem regarding NSClient++.
> I'm monitoring a Windows workstation on my LAN. I configured 
> NSClient++ following its documentation.
> Now, that workstation shows unknown alerts for CPU load, Memory usage 
> and Uptime with the message: NSClient - ERROR: PDH Collection thread 
> not running.
>  
> Looking on google.com, I found that it happens when you are not using 
> English language on Windows. Also, I did not find any resolution for 
> that issue.
> I'm not sure if there is an issue related with the windows language.
>  
> Does somebody know how to solve this issue?
>  
It tells you how to resolve this issue in the installation section of 
the readme.html file that comes with the nsclient download.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] String errors

2007-11-29 Thread Aidan Anderson

Jerad Riggin wrote:
> Ok, so I'm monitoring about 100 websites with string checks via 
> check_http.  We are mirroring what our datacenter actually checks, so 
> we have notifications turned off so when a site goes down we aren't 
> being spammed by the datacenter and our nagios installation.
>
> The issue is that every once in awhile a string changes on the site so 
> it goes critical in our nagios.  We perhaps won't notice it for a day 
> which messes up our availability reports.  Is there a way to 
> retroactively mark the time that it was critical as scheduled downtime?
I'm not aware of any way to retrospectively schedule downtime but you 
could probably solve your problem by adjusting your checking procedure.  
Assuming you or a colleague has access to change the html on your 
websites, you could have a standard string of text that you add to all 
your websites so that Nagios is checking the same text on each site.  
Whenever a new site is added, just make sure that your standard text 
string is added and you will avoid this problem in the first place.

hth

Aidan

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Service notifications for a down host?

2007-11-19 Thread Aidan Anderson

Doug Tabb wrote:
>
> I’m looking for a little behavior confirmation here, please. It’s my 
> understanding that a failed service check is one way a host check is 
> initiated. If Nagios determines the host is down, further service 
> problem notifications are suppressed. However, I still get one or more 
> notifications for the initial service problems. Wouldn’t Nagios 
> suppress those initial service checks until at least one host check 
> has been made?
>
> For illustration, I have a remote site with host parent/child 
> relationships configured. If the site goes down, I get about 2 dozen 
> service notifications from various child hosts before it realizes the 
> top parent host is down and suppresses notifications for that site. I 
> then receive the one host recovery along with the 2 dozen or so 
> service recovery messages. I had hoped to not receive any service 
> notifications in this scenario. Is this expected behavior?
>
> Thank you very much!
>
> Doug Tabb
>
You shouldn't be seeing this behavior. The only time your should see 
this is if your services enter a hard state before the hosts. How does 
your host retry attempts compare to your service retry attempts?

Aidan


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Monitor packet loss with check_ping command

2007-10-22 Thread Aidan Anderson

Alex Dehaini wrote:
> But in this case - if there is a 20% packet loss out of 10 pings sent 
> to a host - will I be notified?
>
That all depends on what you set your max_check_attempts to.  If you 
want to be notified of any packet loss, set this to 1 (one).  Increase 
this value if you prefer more tolerance.






> On 10/22/07, *Giles Coochey* < [EMAIL PROTECTED] 
> > wrote:
>
> check_ping uses the ping command.
>
>  
>
> Packet Loss is considered a reply not within the timeout, this can
> typically be around 3000ms
>
>  
>
> So something like:
>
>  
>
> ./check_ping -H $HOSTNAME$ -w 3000,20% -c 3000,50%
>
>  
>
> Will do what you want.
>
>  
>
> * From: * [EMAIL PROTECTED]
> 
> [mailto:[EMAIL PROTECTED]
> ] *On Behalf Of
> *Alex Dehaini
> *Sent:* 22 October 2007 11:29
> *To:* nagios-users@lists.sourceforge.net
> 
> *Subject:* [Nagios-users] Monitor packet loss with check_ping command
>
>  
>
> Hi Guys,
>
> Can someone give me an example on how I can monitor only packet
> loss but not latency
>
> -- 
> Alex Dehaini
> Developer
> Site - www.alexdehaini.com 
> Email - [EMAIL PROTECTED] 
>
>
>
>
> -- 
> Alex Dehaini
> Developer
> Site - www.alexdehaini.com 
> Email - [EMAIL PROTECTED] 
> 
>
> -
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> 
>
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson

Terry wrote:
> Thanks for the reply.   Let me be more specific:
>
> version: 2.9
> OS: centos 5
>
> I have regular contacts set up, me for example.  I want to get
> notified every 30 minutes indefinitely if a service is in a hard state
> of warning or critical.  However, I want another contact to only get
> notified one time when that hard state is achieved.That's it.
> >From what I can tell, I can only achieve this through the
> notification_interval which is only set at the host/service level, not
> the contact level.  If this is true, I will need to create 2 services,
> each with a different notification_interval and of course apply the
> different contact groups to each service.  Am I correct or is there
> another way around this?
>
> Thanks!
>
> On 10/5/07, Aidan Anderson <[EMAIL PROTECTED]> wrote:
>   
>> Terry wrote:
>> 
>>> I have a contact that I only want to receive one notification.  How
>>> can I set this up?
>>>
>>>
>>>   
Hi Terry,

I've just posted you another message before seeing this one.  You want 
to use host or service escalations to achieve this.  I've briefly 
explained in the previous post but if you need more help, just shout.

Aidan

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson

Aidan Anderson wrote:
> Terry wrote:
>   
>> I have a contact that I only want to receive one notification.  How
>> can I set this up?
>>
>>   
>> 
> A good place to start looking would be here:
>
> http://nagios.sourceforge.net/docs/2_0/notifications.html
>
> ;)
>
>  without supporting info will risk being sent to /dev/null
>   
Apologies, here is where you want to start:

http://nagios.sourceforge.net/docs/2_0/escalations.html

You would specify the contact you only want to receive one notification 
in the first escalation and all other contacts in the first and 
subsequent escalations.

Aidan

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] notify contact only once

2007-10-05 Thread Aidan Anderson

Terry wrote:
> I have a contact that I only want to receive one notification.  How
> can I set this up?
>
>   
A good place to start looking would be here:

http://nagios.sourceforge.net/docs/2_0/notifications.html

;)

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Newbie Notifications Problem

2007-09-24 Thread Aidan Anderson

Ray Wadkins wrote:
> Thanks for the reply.  I didn't include notify-host-by-email because it
> didn't seem relevant, but it's in commands.cfg (pasted below).  The host
> isn't failing, just the service.  When you say "service notifications
> are suppressed" what do you mean?  Is there a configuration I can't see
> that's suppressing service notifications?  
>
>   
It's something Nagios does by default.  If a service check fails, it 
will check the host.  If the host check fails, it will send out a host 
notification but suppress the service notification.

By what you've said, I don't think that's your problem.  I've noticed 
that you have used a lot of templates (inheritance) in your configs.  
You could try simplifying it but just setting up a contact, a contact 
group, a host, a service and a time period but don't use templates.  If 
that basic test works the problem may lie with one of your templates.

Aidan

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Newbie Notifications Problem

2007-09-23 Thread Aidan Anderson

Ray Wadkins wrote:
>
> define contact{
>
> contact_namerwadkins_e  ; 
> Short name of user
>
> use generic-contact ; 
> Inherit default values from generic-contact template (defined above)
>
> alias   Ray Wadkins ; Full 
> name of user
>
> email   X; <<* CHANGE THIS 
> TO YOUR EMAIL ADDRESS **
>
> host_notifications_enabled  1
>
> service_notifications_enabled   1
>
> host_notification_period24x7
>
> service_notification_period 24x7
>
> host_notification_options   d,u,r,f,s
>
> service_notification_optionsw,u,c,r,f,s
>
> host_notification_commands  notify-host-by-email
>
> service_notification_commands   notify-service-by-email
>
>   
>
You've specified the command notify-host-by-email in your contact definition
>
> *From commands.cfg*
>
> * *
>
> define command{
>
> command_namenotify-service-by-email
>
> command_line/usr/bin/printf "%b" "* Nagios 
> *\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: 
> $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $
>
> HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: 
> $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s 
> "** $NOTIFICATIONTYPE$ Service Alert: $HOS
>
> TALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
>
> }
>
>  
>
but don't seem to have defined it in the commands.cfg file.

When a host goes down, only host notifications are sent out (service 
notifications are suppressed).  As you don't seem to have defined a host 
notification command, you will never receive any notifications.  Try 
adding the following to commands.cfg:

> # 'notify-host-by-email' command definition
> define command{
> command_namenotify-host-by-email
> command_line/usr/bin/printf "%b" "Notification Type: 
> $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nDetails: $HOSTALIAS$\nAddress: 
> $HOSTADDRESS$\nState: $HOSTSTATE$\nInfo: $HOSTOUTPUT$\n\nDate/Time: 
> $LONGDATETIME$\n\n$HOSTACKAUTHOR$\n$HOSTACKCOMMENT$\n" | /bin/mail -s 
> "Host $HOSTSTATE$ alert for $HOSTNAME$ - $HOSTALIAS$" $CONTACTEMAIL$
> }
HTH
Aidan



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Configure smtp in Nagios

2007-09-21 Thread Aidan Anderson

Rodrigo Tavares wrote:
> Hello,
>
> How I do configure smtp in Nagios ?
>
> best regards,
>
> Rodrigo Faria
>   
You don't.  Whatever mail server you are running on the Nagios box will 
take care of SMTP.  Nagios simply pipes the notification through the 
/bin/mail command or whatever command suits the mail server.  Most 
distros come with Sendmail or Postfix by default, just make sure you 
have one running and configured to route mail.

Aidan

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios Log Management Tips?

2007-09-20 Thread Aidan Anderson

Rogelio Bastardo wrote:
> Anyone have any tips for dealing with Nagios logs?
>
> Things are getting a little crazy, and I haven't even been logging very much!
>
> e.g.
>
> [EMAIL PROTECTED] run]# find / *nagios*  -type f -size +100k
> -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
> /var/log/nagios/archives/nagios-08-14-2007-00.log: 2.3G
> /var/log/nagios/archives/nagios-08-13-2007-00.log: 3.4G
> /var/log/nagios/archives/nagios-08-12-2007-00.log: 2.6G
> /var/log/messages.4: 3.5G
> [EMAIL PROTECTED] run]#
>
> -
>
>   
Good grief, what on earth are you logging?  I'm monitoring over 1000 
hosts and 1600 services and my daily logs range between 600KB and 
1.5MB.  Can you post a snippet of your log (say a 15min span ) so we can 
get an idea of what it is logging?

I'd love to see how your browser copes with viewing the daily log.

Aidan

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] NRPE (No output returned from plugin)

2007-07-20 Thread Aidan Anderson

shacky wrote:
> Hi.
>
> I'm using NRPE to monitor a remote server.
> The most part of the plugins works without problems, but 
> check_backuppc returns the error "(No output returned from plugin)" in 
> the Nagios web interface.
>
> The check_backuppc stanza in the Nagios configuration is the following:
>
> define service{
> use remote-service
> host_name   myremoteserver
> service_description BackupPC
> check_command   check_nrpe!check_backuppc
> }
>
> If I execute from the shell "check_nrpe -H bakserver.blupixel.local -c 
> check_backuppc" I correctly get the plugin's answer ("BACKUPPC WARNING 
> - (5/7) failures").
>
> Where is the problem?
>
Have you set up the command definition correctly in commands.cfg or 
wherever you store your commands on your Nagios server.  Also check that 
Nagios has permission to execute the pluggin on the remote machine.  
Test by re-trying your check_nrpe command logged on as nagios.

Aidan

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Cancel Downtime?

2007-06-15 Thread Aidan Anderson

Aidan Anderson wrote:
>> On Jun 7, 2007, at 11:18 PM, Anthony Mendoza wrote:
>>
>>   
>> 
>>> Click "Downtime" and then the Trash can icon to the right of the
>>> service/host you want to cancel.
>>>
>>> 
>>>   
>>>> -Original Message-
>>>> From: [EMAIL PROTECTED]
>>>> [mailto:[EMAIL PROTECTED] On Behalf
>>>> Of Wil Schultz
>>>> Sent: Thursday, June 07, 2007 11:11 PM
>>>> To: nagios-users
>>>> Subject: [Nagios-users] Cancel Downtime?
>>>>
>>>> IIRC, there used to be a "Cancel Downtime" link, am I blind or did
>>>> this go away?
>>>>
>>>> How do you cancel scheduled downtime?
>>>>
>>>>   
>>>> 
>
> I need to cancel scheduled downtime on a host and took the advise of 
> Anthony Mendoza in this thread.  Clicking on the "Trash can" icon 
> certainly removes the Nagios generated comment but the period of 
> scheduled downtime remains.  Any ideas anyone?
>
> regards,
> Aidan
>
>
>   
Ignore last e-mail, I found it.  You do it from the downtime link on the 
sidebar. :)

cheers,
Aidan

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Cancel Downtime?

2007-06-15 Thread Aidan Anderson

>
> On Jun 7, 2007, at 11:18 PM, Anthony Mendoza wrote:
>
>   
>> Click "Downtime" and then the Trash can icon to the right of the
>> service/host you want to cancel.
>>
>> 
>>> -Original Message-
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf
>>> Of Wil Schultz
>>> Sent: Thursday, June 07, 2007 11:11 PM
>>> To: nagios-users
>>> Subject: [Nagios-users] Cancel Downtime?
>>>
>>> IIRC, there used to be a "Cancel Downtime" link, am I blind or did
>>> this go away?
>>>
>>> How do you cancel scheduled downtime?
>>>
>>>   

I need to cancel scheduled downtime on a host and took the advise of 
Anthony Mendoza in this thread.  Clicking on the "Trash can" icon 
certainly removes the Nagios generated comment but the period of 
scheduled downtime remains.  Any ideas anyone?

regards,
Aidan

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] how to unsubscribe????

2007-06-02 Thread Aidan Anderson

Go to https://lists.sourceforge.net/lists/listinfo/nagios-users

Go to the bottom of the page to the section headed Nagios-users 
Subscribers and follow the instructions for unsubscribing.  You'll need 
your password.

Aidan

Arief Iqbal wrote:
> hi, how can i unsubscribe from this goddamned mailing list??? thx
>
> 
> Boardwalk for $500? In 2007? Ha!
> Play Monopoly Here and Now 
> 
>  
> (it's updated for today's economy) at Yahoo! Games.
> 
>
> -
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> 
>
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-13 Thread Aidan Anderson

Aidan Anderson wrote:
> Ton Voon wrote:
>   
>> On 11 May 2007, at 20:25, Aidan Anderson wrote:
>>
>>   
>> 
>>> First of all, thank-you for the replies!
>>>
>>> The majority of devices that I monitor are routers/vpn devices and I
>>> have (on the documentation's advice) not set active checks on the  
>>> hosts
>>> and instead I've added check_ping as a service on each of these  
>>> hosts to
>>> do 5 pings as follows:
>>>
>>> check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
>>>
>>> For the host check I already use as you suggested a check_ping that  
>>> only
>>> does one ping as follows:
>>>
>>> check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
>>>
>>> My understanding was that if the service check failed it would then
>>> abandon the service check altogether and move onto the host check  
>>> which
>>> is only 1 ping.  The fact that the service checks are parallelised
>>> should mean that it shouldn't matter that there are 5 pings and the  
>>> host
>>> check is only 1 ping which should resolve the bottleneck of serialised
>>> host checks.  I'm at a loss as to why performance has been impacted so
>>> severely.
>>>
>>> Maybe I need to abandon the service checks altogether and just have a
>>> host check.  I'm reluctant to do this because I get very useful
>>> information from 5 pings, ie packet loss and high rta which is
>>> particularly handy for checking volatile links such as ADSL.  Maybe  
>>> that
>>> is the trade-off, fast host checking with no useful stats or slow host
>>> checking with useful stats.
>>> 
>>>   
>> Just noticed this in your original email:
>>
>> Host Check Execution Time:   0.03   / 10.04   / 0.843 sec
>>
>> This means that some of your host checks are taking 10 seconds, which  
>> is, funnily enough, the timeout period for check_ping. So the -p 1  
>> will still take 10 seconds if the routers are not responding.
>>
>> You can use a timeout flag for check_ping (but is only supported on  
>> some OSes). I guess check_icmp is a better bet here.
>>
>> Ton
>>   
>> 
> Hi Ton,
>
> Well spotted, thank-you.  check_icmp here we come :)
>
> thanks
> Aidan
>   
I've now changed my host and services checks to use check_icmp instead 
of check_ping.  It seems to work far more efficiently and has dropped my 
average service and host check execution times from 11 seconds to 4-5 
seconds.

It didn't, however, make Nagios notice the hosts go down any quicker.  
It still took an hour to notice that 109 hosts had gone down and during 
that hour, latency times shot up above 2000 seconds.  Once it had 
finally noticed that all 109 hosts were down, latency times dropped back 
to normal.  This must be down to the serialisation of host checks so 
I'll wait patiently for the stable release of version 3.

Thanks again for the replies.

regards,
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson

Ton Voon wrote:
> On 11 May 2007, at 20:25, Aidan Anderson wrote:
>
>   
>> First of all, thank-you for the replies!
>>
>> The majority of devices that I monitor are routers/vpn devices and I
>> have (on the documentation's advice) not set active checks on the  
>> hosts
>> and instead I've added check_ping as a service on each of these  
>> hosts to
>> do 5 pings as follows:
>>
>> check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
>>
>> For the host check I already use as you suggested a check_ping that  
>> only
>> does one ping as follows:
>>
>> check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
>>
>> My understanding was that if the service check failed it would then
>> abandon the service check altogether and move onto the host check  
>> which
>> is only 1 ping.  The fact that the service checks are parallelised
>> should mean that it shouldn't matter that there are 5 pings and the  
>> host
>> check is only 1 ping which should resolve the bottleneck of serialised
>> host checks.  I'm at a loss as to why performance has been impacted so
>> severely.
>>
>> Maybe I need to abandon the service checks altogether and just have a
>> host check.  I'm reluctant to do this because I get very useful
>> information from 5 pings, ie packet loss and high rta which is
>> particularly handy for checking volatile links such as ADSL.  Maybe  
>> that
>> is the trade-off, fast host checking with no useful stats or slow host
>> checking with useful stats.
>> 
>
> Just noticed this in your original email:
>
> Host Check Execution Time:   0.03   / 10.04   / 0.843 sec
>
> This means that some of your host checks are taking 10 seconds, which  
> is, funnily enough, the timeout period for check_ping. So the -p 1  
> will still take 10 seconds if the routers are not responding.
>
> You can use a timeout flag for check_ping (but is only supported on  
> some OSes). I guess check_icmp is a better bet here.
>
> Ton
>   
Hi Ton,

Well spotted, thank-you.  check_icmp here we come :)

thanks
Aidan


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson

Ton Voon wrote:
> On 11 May 2007, at 19:03, Jim Avery wrote:
>
>   
>> On 11/05/07, Aidan Anderson <[EMAIL PROTECTED]> wrote:
>>
>> 
>>> A lot of people have mentioned using fping to speed things up but  
>>> if my
>>> average service latency is only 0.479 seconds in normal  
>>> circumstances, I
>>> can't see how tweaking this will help in a major outage situation.
>>>   
>> check_ping won't finish until it's done all the pings, and the pings
>> are (if I recall) always at one second intervals.  This means that if
>> you've configured check_ping to do (let's say) 5 pings, the check_ping
>> plugin will always take at least 5 seconds to complete.
>>
>> If the check_ping is being run as a host check rather than a service
>> check, my understanding is that this is the only thing Nagios will be
>> doing; it doesn't do anything else concurrently (correct me if I'm
>> wrong people).
>> 
>
> Correct. We noticed this some time ago too: http://altinity.blogs.com/ 
> dotorg/2006/05/immediate_perfo.html
>
> If you do stick to using check_ping, use -p 1 which is sub second  
> response time.
>
>   
First of all, thank-you for the replies!

The majority of devices that I monitor are routers/vpn devices and I 
have (on the documentation's advice) not set active checks on the hosts 
and instead I've added check_ping as a service on each of these hosts to 
do 5 pings as follows:

check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

For the host check I already use as you suggested a check_ping that only 
does one ping as follows:

check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

My understanding was that if the service check failed it would then 
abandon the service check altogether and move onto the host check which 
is only 1 ping.  The fact that the service checks are parallelised 
should mean that it shouldn't matter that there are 5 pings and the host 
check is only 1 ping which should resolve the bottleneck of serialised 
host checks.  I'm at a loss as to why performance has been impacted so 
severely.

Maybe I need to abandon the service checks altogether and just have a 
host check.  I'm reluctant to do this because I get very useful 
information from 5 pings, ie packet loss and high rta which is 
particularly handy for checking volatile links such as ADSL.  Maybe that 
is the trade-off, fast host checking with no useful stats or slow host 
checking with useful stats.

regards,
Aidan

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nrpe command line test question

2007-05-11 Thread Aidan Anderson

Maxwell,Brady wrote:
>
> My nrpe.cfg on the remote host contains these commands
>
> command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c 
> $ARG2$ -p $ARG3$
>
> command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 
> -p /dev/vga/root
>
> running a check_nrpe from the command line has the following results.
>
> [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
> check_disk –a 10 5 /dev/vga/root
>
> check_disk: Warning threshold must be integer or percentage!
>
> [EMAIL PROTECTED] ~]# /usr/local/nagios/libexec/check_nrpe -H hostname -c 
> check_disk1
>
> DISK OK - free space: / 801 MB (12% inode=81%);| 
> /=5625MB;6405;6415;80;6425
>
> I would like to be able to pass arguments to the remote system, 
> allowing me to set threshold values at the service level.
>
> Can anyone tell me why I get the error “Warning threshold must be 
> integer or percentage!” ?
>
> Or suggest another method of passing the args to the remote nrpe process?
>
> Thanks
>
> Brady
>
> 
>
> -
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> 
>
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
Make sure that you set dont_blame_nrpe to 1 in nrpe.cfg to allow nrpe to 
accept client arguments. This is set to 0 by default as it is deemed a 
security risk

Aidan



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Severe peformance issue during major network outage

2007-05-11 Thread Aidan Anderson

Hi,

I have recently set up Nagios 2.8 and am monitoring 1623 hosts and 1946 
services.  Performance under normal circumstances is fine.  Typical 
check and latency times are as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.418 sec
Service Check Latency:   0.00   / 1.87/ 0.479 sec
Host Check Execution Time:   0.03   / 10.04   / 0.843 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

The vast majority of these hosts are spread over 320 geographic 
locations throughout the UK.  These locations are connected to our data 
centre via a hardware VPN device with the majority (about 270) using a 
private ADSL circuit to facilitate the VPN connection.

Yesterday, we had a major outage caused by the failure of one of the 
ADSL central routers at our ISP.  This took out a third of our ADSL 
sites (roughly 90) for 16 minutes.  Each of these sites has about 4 
devices monitored by Nagios so in effect about 360 devices (hosts) went 
down in an instant.

As you can imagine, we were aware of the problem almost immediately due 
to the barrage of phone calls from out clients, but unfortunately Nagios 
didn't even remotely reflect the current situation.  I have used parent 
child relationships to the full so I was expecting a good portion of the 
VPN devices to show as down with all other devices behind the VPN device 
showing as unreachable.  This was not the case.  It actually took half 
an hour to find only 20 of these VPN devices down and another half an 
hour to notice that they were actually back up again having only noticed 
20 of the 90 in the first place.  During the outage, the service check 
latency was increasing exponentially and the performance stats half an 
hour after the start of the problem were as follows:

Monitoring Performance
Service Check Execution Time:0.03   / 11.04   / 3.646 sec
Service Check Latency:   947.84 / 2080.05 / 1467.274 sec
Host Check Execution Time:   0.03   / 10.04   / 0.968 sec
Host Check Latency:  0.00   / 0.00/ 0.000 sec
# Active Host / Service Checks:  1623   / 1946
# Passive Host / Service Checks: 0 / 0

As you can see, the average service check latency time has jumped to 
1467 seconds (24 mins).  On all of these hosts there is only one service 
which is a ping (check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5).  
The host check is also a ping (check_ping -H $HOSTADDRESS$ -w 3000.0,80% 
-c 5000.0,100% -p 1) but much faster with only 1 ping being sent out.  
The normal_check_interval on services is 5 mins with 2 
max_check_attempts and a retry_interval of 1.   The host also has a 
max_check_attempts of 2.

A lot of people have mentioned using fping to speed things up but if my 
average service latency is only 0.479 seconds in normal circumstances, I 
can't see how tweaking this will help in a major outage situation.

I have also read through the section on tweaking performance which seems 
to be geared toward protecting the machine Nagios is running on.  I want 
to do the opposite and give Nagios a lot more work to do.  The machine 
is dedicated to Nagios and is quite high spec.  It's an IBM xServies 336 
with 2 Dual Core processors and 4GB of RAM so it should be able to take 
a much bigger hit.  I have been monitoring CPU performance with MRTG and 
the CPU performance never goes lower than 90% idle.  Ironically during 
the problem, the machines idle time jumped to 95% when I would have 
expected to drop rather than increase.

The only performance tweak I could see that would affect the performance 
in this situation is max_concurrent_checks but this is already set to 0.

I am fairly new to Nagios (2 months) so I apologise if I have missed 
something obvious but any pointers to a solution to this problem would 
be greatly appreciated.  I have run a nagios -s (attached below) which 
seems to indicate that everything is setup ok.  Let me know if you 
require any more information from my config that would help diagnose the 
problem.

regards,
Aidan




Nagios 2.8
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 03-08-2007
License: GPL

Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---
Total hosts: 1624
Total scheduled hosts:   0
Host inter-check delay method:   SMART
Average host check interval: 0.00 sec
Host inter-check delay:  0.00 sec
Max host check spread:   30 min
First scheduled check:   N/A
Last scheduled check:N/A


SERVICE SCHEDULING INFORMATION
---
Total services: 1947
Total scheduled services:   1947
Service inter-check delay method:   SMART
Average

Re: [Nagios-users] Disable service_notification_commands

2007-05-02 Thread Aidan Anderson

Hi Kareem,

I am using 2.8 and the docs have 'service_notification_commands' in red 
(required) so I don't know whether that is an error in the docs or not.

If you want to disable service notifications, put the directive back and 
simply specify the 'n' option.  This will disable service notifications.

regards,
Aidan

Kareem Mahgoub wrote:
> Dear All
> I am using Nagios 2.5 and I want to disable the service notification
> command.
> On the documentation under the  section of "Contact Definition", I can see
> that the directive "service_notification_commands" is in black which means
> it is optional. When I commented it and made a conf check it gave Error:
> Contact 'kareem' has no service notification commands defined!
> Am I missing something here?
> Best Regards,
> Kareem Mahgoub
>
>
> -
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
>   

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Service Group Summary Changing Numbers

2007-04-10 Thread Aidan Anderson





Hi Elijah,

Fantastic, glad I could help.

The situation you mentioned where not all processes stop after a
restart is quite common and has been mentioned a few times on the
list.  I had similar problems and one post suggested doing a "reload"
rather than a "restart".  I now religiously use "reload" and have not
had a problem since.

regards,
Aidan


Elijah Savage wrote:

  Aidan,

Not sure how I miss that but you are right there were multiple processes running. I think my situation was from actually doing a restart on the services with the init script and they all did not stop for some reason. I have since stopped all services killed off any additional processes and now things seem to be back to exactly what I have grown to expect, a nice stable platform in nagios.

Thank you


- Original Message -
From: Aidan Anderson <[EMAIL PROTECTED]>
To: Nagios Users Mailinglist 
Sent: Tuesday, April 10, 2007 6:27:21 AM GMT-0500 Auto-Detected
Subject: Re: [Nagios-users] Service Group Summary Changing Numbers

Hi Elijah,

This sounds similar to a problem that I had, refreshing the browser was 
giving me different results.  It turned out that the problem was to do 
with 2 Nagios processes running.  When I was refreshing the browser, it 
was randomly picking one of the processes and reporting back the state 
of that particular instance hence the different results on each 
refresh.  To rectify, I stopped Nagios and manually removed the 
remaining process and then started Nagios again.  I caused the problem 
during a Nagios upgrade, I didn't stop Nagios before starting the 
upgrade so it ended up being started twice.

Regards,
Aidan



Elijah Savage wrote:
  
  
All,

I have something going on that I consider very weird happening. Under 
service group summary my numbers are changing on refresh of the 
browser when there are no devices down. I have 4 different host groups 
on that page, but in one group I have 70 devices. You login it shows 
70 devices up then you do a refresh and it will show 60 devices up 
none down when you know you have 70, next refresh it may show 
68devices up none down. I know it all sounds like baby talk but it is 
some what difficult for me to explain. It does this under the 
hostgroup summary as well.

I have been on this list for a long time and have never had to post 
because through reading the emails and searching the archives I have 
been able to achieve what I needed to for my environment, but I could 
not find anything close to what I am seeing now.

Nagios is Version 2.7 updated this past weekend had I known and was 
paying attention I would have waited on the 2.8 release from this 
weekend :)
Running on Solaris and Sun V880 Platform 4cpu's 8gig of mem.

The server is no where close to being over loaded. Thing is I do not 
know if this was happening on the previous version. Of course when you 
announce a major change or upgrade people really start to pay close 
attention to the tools they use.

Oh yeah one last thing these devices being monitored are Cisco devices 
with the check_command   check-router-alive.

Any help would be greatly appreciated.



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV


___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

  
  
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
  




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topic

Re: [Nagios-users] Service Group Summary Changing Numbers

2007-04-10 Thread Aidan Anderson

Hi Elijah,

This sounds similar to a problem that I had, refreshing the browser was 
giving me different results.  It turned out that the problem was to do 
with 2 Nagios processes running.  When I was refreshing the browser, it 
was randomly picking one of the processes and reporting back the state 
of that particular instance hence the different results on each 
refresh.  To rectify, I stopped Nagios and manually removed the 
remaining process and then started Nagios again.  I caused the problem 
during a Nagios upgrade, I didn't stop Nagios before starting the 
upgrade so it ended up being started twice.

Regards,
Aidan

Elijah Savage wrote:
> All,
>
> I have something going on that I consider very weird happening. Under 
> service group summary my numbers are changing on refresh of the 
> browser when there are no devices down. I have 4 different host groups 
> on that page, but in one group I have 70 devices. You login it shows 
> 70 devices up then you do a refresh and it will show 60 devices up 
> none down when you know you have 70, next refresh it may show 
> 68devices up none down. I know it all sounds like baby talk but it is 
> some what difficult for me to explain. It does this under the 
> hostgroup summary as well.
>
> I have been on this list for a long time and have never had to post 
> because through reading the emails and searching the archives I have 
> been able to achieve what I needed to for my environment, but I could 
> not find anything close to what I am seeing now.
>
> Nagios is Version 2.7 updated this past weekend had I known and was 
> paying attention I would have waited on the 2.8 release from this 
> weekend :)
> Running on Solaris and Sun V880 Platform 4cpu's 8gig of mem.
>
> The server is no where close to being over loaded. Thing is I do not 
> know if this was happening on the previous version. Of course when you 
> announce a major change or upgrade people really start to pay close 
> attention to the tools they use.
>
> Oh yeah one last thing these devices being monitored are Cisco devices 
> with the check_command   check-router-alive.
>
> Any help would be greatly appreciated.
>
> 
>
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> 
>
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Up Duration not being maintained over restarts

2007-03-22 Thread Aidan Anderson

Hi,

I'm a new user to Nagios and have recently installed the new 3.0a1 
version on Fedora Core 5 from source.

I have been setting up various hosts and have noticed that when you 
restart Nagios, some of the hosts being monitored do not retain their UP 
duration (if they were in an UP state at the point of restart).  The UP 
duration is reset to 0d 0h 0m 0s and starts counting up from there.

I investigated further and have noticed that all affected hosts have a + 
symbol tagged on to the end of the duration, .e.g. 0d 1h 8m 41s+.  I've 
also noticed that if any one of these hosts enters a down state and/or 
recovers while Nagios is running, the + disappears from the end of the 
duration and the correct state duration is maintained over restarts from 
that point on.

I have read through this list and the documentation and can't find a 
mention of what a + means and why the duration is being reset over restarts.

Any information to explain/correct this behaviour would be appreciated.

regards,
Aidan


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_disk plugin

Re: [Nagios-users] About host check retry interval for nagios v 3.x

Re: [Nagios-users] Host Dependency Object Inheritance Issue

[Nagios-users] Host Dependency Object Inheritance Issue

Re: [Nagios-users] Persistent Comment in Acknowledgement

Re: [Nagios-users] Persistent Comment in Acknowledgement

[Nagios-users] Persistent Comment in Acknowledgement

[Nagios-users] Child host becomes UNREACHABLE when parent changes from UP to a SOFT DOWN state

Re: [Nagios-users] problem creating hostgroup

Re: [Nagios-users] Accessing Nagios for the first time

Re: [Nagios-users] host status in nagios/var/status.dat

Re: [Nagios-users] NSClient issue (Unknown alerts)

Re: [Nagios-users] String errors

Re: [Nagios-users] Service notifications for a down host?

Re: [Nagios-users] Monitor packet loss with check_ping command

Re: [Nagios-users] notify contact only once

Re: [Nagios-users] notify contact only once

Re: [Nagios-users] notify contact only once

Re: [Nagios-users] Newbie Notifications Problem

Re: [Nagios-users] Newbie Notifications Problem

Re: [Nagios-users] Configure smtp in Nagios

Re: [Nagios-users] Nagios Log Management Tips?

Re: [Nagios-users] NRPE (No output returned from plugin)

Re: [Nagios-users] Cancel Downtime?

Re: [Nagios-users] Cancel Downtime?

Re: [Nagios-users] how to unsubscribe????

Re: [Nagios-users] Severe peformance issue during major network outage

Re: [Nagios-users] Severe peformance issue during major network outage

Re: [Nagios-users] Severe peformance issue during major network outage

Re: [Nagios-users] nrpe command line test question

[Nagios-users] Severe peformance issue during major network outage

Re: [Nagios-users] Disable service_notification_commands

Re: [Nagios-users] Service Group Summary Changing Numbers

Re: [Nagios-users] Service Group Summary Changing Numbers

[Nagios-users] Up Duration not being maintained over restarts

35 matches

Site Navigation

Mail list logo

Footer information