Re: [Nagios-users] Alerting based on past-to-current trends?

2010-12-10 Thread Jim Avery
On 6 December 2010 19:02, Ian Ehrenwald iehrenw...@tripadvisor.com wrote:
 Hello
 I was wondering if there was a straight-forward way to alert based on an 
 average of past data plus a current perfdata entry.  I understand I'm not 
 explaining it very well that way, so here is the real-world example I am 
 working with -

 I am polling a set of machines via SNMP for CPU load every 1 minute (looking 
 at hrProcessorLoad).  If the return value is at or above 95%, send out a 
 WARNING.  If the return value is 98% or above, send out a CRITICAL.  The 
 problem here is that it's OK for a process to take up 100% CPU for multiple 
 seconds, and sometimes that high CPU usage coincides with the SNMP %CPU 
 query, so I get a lot of false alerts.

 Is there a way to use past perfdata in conjunction with the current returned 
 data to generate an average and send a WARNING or CRITICAL based on that new 
 number?  I only care to get alerted from Nagios if, for example, the %CPU has 
 been at 100% for 5 minutes.  Or am I just way over-thinking this and should 
 be monitoring 1m, 5m, 15m UNIX load averages (which doesn't seem that 
 accurate anyway)?  What are other people doing to monitor CPU usage and alert 
 on abnormal long periods of utilization?


Nagios will alert as soon as the plugin returns a non-OK status.  You
can of course configure max_check_attempts and/or
first_notification_delay so that Nagios won't send a notification
until after a given time, but this won't stop it from appearing on on
the web page for problem services straight away.

It would be great if you could get Nagios to display only hard status
alerts - I don't think you can though, not with ordinary Nagios Core
anyway.  Some of the third-party Nagios front ends will do it, for
example you can configure the icons in NagVis only to display hard
alerts.

Cheers,

Jim

--
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Alerting based on past-to-current trends?

2010-12-10 Thread Jim Avery
On 10 December 2010 18:43, Rick Carter rick.car...@umich.edu wrote:
 Hi Jim,

 I'm wondering if load average would get you where you want to be, as in a lot 
 of cases, a CPU busy might not be a big deal unless the run queue is growing.

 My nagios-fu isn't good enough to tell you how to get that, but when I saw 
 your message, I thought right away of the linux/unix:

 $ uptime
 13:41  up 2 days, 18:11, 2 users, load averages: 0.31 0.25 0.24

 Where the 2nd load average is the 5-minute one.

 - Rick

Good point Rick,

there is a check_load plugin, and you could indeed set appropriate
thresholds to make it concentrate on the 15-minute value rather than
the 5-minute or 1-minute values.

As to what 'load' actually means I'm not 100% sure.  I've read
http://www.teamquest.com/resources/gunther/display/5/index.htm a few
times, and think it helps a bit!  I even bought Gunther's book
Guerilla Capacity Planning but confess I haven't read anywhere near
all of it.

I seem to recall reading somewhere that as a general rule of thumb if
load is  2 * the number of cpus, it's probably affecting performance.
 Certainly on my own Nagios server with 4 CPUs I find it's struggling
whenever load is consistently  10.

Cheers,

Jim

--
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null