Re: [Nagios-users] Service check goes HARD too quick if multiple service are in problem state

2013-01-16 Thread FTL Nagios
/pulseaudio/pulse/gconf-helper
nagios   18659 14027  0 09:26 pts/300:00:00 ps -ef
nagios   18660 14027  0 09:26 pts/300:00:00 grep --color=auto nagios
nagios   24025 1  0 07:57 ?00:00:04 /usr/bin/python
/usr/bin/update-manager --no-focus-on-map
nagios   29231 1  0 Jan15 ?00:00:00
/usr/lib/notify-osd/notify-osd


What will deleting those files you mentioned do?

Thanks in advance



-Original Message-
From: Justin T Pryzby [mailto:just...@norchemlab.com] 
Sent: 15 January 2013 18:02
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] Service check goes HARD too quick if multiple
service are in problem state

You could check that the check intervals show up right in objects.cache.

You could also try stopping nagios (check with ps that you don't have
multiple daemons running), removing the generated files and restarting (note
that this will cause notifications to be sent from scratch; you may want to
disable them first).

/var/cache/nagios3/
objects.cache  status.dat

/var/lib/nagios3/
retention.dat

On Tue, Jan 15, 2013 at 05:51:35PM +, Andrew Thompson wrote:
 Hi,
 
 I have had this problem previously and posted here but not go nowhere with
it.
 
 Ill have another bash.
 
 Basically my nagios machine is checking too frequently and firing out 
 alerts too quickly
 
 Its ignoring the retry_interval value, the max_check_attempts value and
ingoring the notification_interval  value in the escalations.
 
 I have check interval of 5 minutes in OK state Retry interval of 3 
 minutes when in problem state Notification interval of 3 minutes
 
 I believe that below is the problem and multiple service checks in problem
state at the same time is casuing this.
 
 
 Ive just seen this on 1 of my hosts:
 
 It appears its accumulating the service checks (even though they are
different checks) into a final HARD state.
 
 Prior to 17:18 all was fine on this host!!!
 
 
 Then at 17:18 a SQL check went to warning state and to SOFT 1
 
 Checked again at 17:21 which is the 3 minute interval I have told it 
 too when in problem and its still warning so onto SOFT2
 
 Then a different service check on that host goes critical - but for 
 the first time
 
 17:22 memory usage and it puts this to HARD 3 - even though this 
 actual check for memory should be SOFT1
 
 An alert then got sent straight out for the Memory check even though 
 it was actually only check 1/3 on that particular service
 
 Here is the copy and past from the History of the host
 
 [01-15-2013 17:18:24]
 SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;1;WARNING - 
 2.3067 lock timeouts / sec for _Total, 2.0667 lock timeouts / sec for 
 Key, 0. lock timeouts / sec for RID, 0.2400 lock timeouts / sec 
 for Page, 0. lock timeouts / sec for Object, 0. lock timeouts 
 / sec for Metadata, 0. lock timeouts / sec for HoBT, 0. lock 
 timeouts / sec for File, 0. lock timeouts / sec for Extent, 0. 
 lock timeouts / sec for Database, 0. lock timeouts / sec for 
 Application, 0. lock timeouts / sec for AllocUnit
 [01-15-2013 17:21:24]
 SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;2;WARNING - 
 1.3056 lock timeouts / sec for _Total, 1.1833 lock timeouts / sec for 
 Key, 0. lock timeouts / sec for RID, 0.1222 lock timeouts / sec 
 for Page, 0. lock timeouts / sec for Object, 0. lock timeouts 
 / sec for Metadata, 0. lock timeouts / sec for HoBT, 0. lock 
 timeouts / sec for File, 0. lock timeouts / sec for Extent, 0. 
 lock timeouts / sec for Database, 0. lock timeouts / sec for 
 Application, 0. lock timeouts / sec for AllocUnit
 
 [01-15-2013 17:22:04]
 SERVICE ALERT: SERVER;MEMORY USAGE;CRITICAL;HARD;3;CRITICAL: physical 
 memory: Total: 10G - Used: 9.81G (98%) - Free: 192M (2%)  critical
 
 
 
 Does anybody please have any idea why my server is checking too frequently
and alerting too frequently and why its totting up different service checks?
 
 This machine has done nothing but not work right since it was loaded a
couple months ago.
 Im using the come config files on it as I did on the previous box I had -
only difference was that was running 3.3.1 - I had none of these problems on
that install.
 
 
 This is a Nagios 3.4.1 install on a Ubuntu 12.04 desktop 32 bit OS


--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS and
more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info

[Nagios-users] Service check goes HARD too quick if multiple service are in problem state

2013-01-15 Thread Andrew Thompson
Hi,

I have had this problem previously and posted here but not go nowhere with it.

Ill have another bash.

Basically my nagios machine is checking too frequently and firing out alerts 
too quickly

Its ignoring the retry_interval value, the max_check_attempts value and 
ingoring the notification_interval  value in the escalations.

I have check interval of 5 minutes in OK state
Retry interval of 3 minutes when in problem state
Notification interval of 3 minutes

I believe that below is the problem and multiple service checks in problem 
state at the same time is casuing this.


Ive just seen this on 1 of my hosts:

It appears its accumulating the service checks (even though they are different 
checks) into a final HARD state.

Prior to 17:18 all was fine on this host!!!


Then at 17:18 a SQL check went to warning state and to SOFT 1

Checked again at 17:21 which is the 3 minute interval I have told it too when 
in problem and its still warning so onto SOFT2

Then a different service check on that host goes critical - but for the first 
time

17:22 memory usage and it puts this to HARD 3 - even though this actual check 
for memory should be SOFT1

An alert then got sent straight out for the Memory check even though it was 
actually only check 1/3 on that particular service

Here is the copy and past from the History of the host

[01-15-2013 17:18:24]
SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;1;WARNING - 2.3067 lock 
timeouts / sec for _Total, 2.0667 lock timeouts / sec for Key, 0. lock 
timeouts / sec for RID, 0.2400 lock timeouts / sec for Page, 0. lock 
timeouts / sec for Object, 0. lock timeouts / sec for Metadata, 0. lock 
timeouts / sec for HoBT, 0. lock timeouts / sec for File, 0. lock 
timeouts / sec for Extent, 0. lock timeouts / sec for Database, 0. lock 
timeouts / sec for Application, 0. lock timeouts / sec for AllocUnit
[01-15-2013 17:21:24]
SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;2;WARNING - 1.3056 lock 
timeouts / sec for _Total, 1.1833 lock timeouts / sec for Key, 0. lock 
timeouts / sec for RID, 0.1222 lock timeouts / sec for Page, 0. lock 
timeouts / sec for Object, 0. lock timeouts / sec for Metadata, 0. lock 
timeouts / sec for HoBT, 0. lock timeouts / sec for File, 0. lock 
timeouts / sec for Extent, 0. lock timeouts / sec for Database, 0. lock 
timeouts / sec for Application, 0. lock timeouts / sec for AllocUnit

[01-15-2013 17:22:04]
SERVICE ALERT: SERVER;MEMORY USAGE;CRITICAL;HARD;3;CRITICAL: physical memory: 
Total: 10G - Used: 9.81G (98%) - Free: 192M (2%)  critical



Does anybody please have any idea why my server is checking too frequently and 
alerting too frequently and why its totting up different service checks?

This machine has done nothing but not work right since it was loaded a couple 
months ago.
Im using the come config files on it as I did on the previous box I had - only 
difference was that was running 3.3.1 - I had none of these problems on that 
install.


This is a Nagios 3.4.1 install on a Ubuntu 12.04 desktop 32 bit OS



Thanks in advance

--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Service check goes HARD too quick if multiple service are in problem state

2013-01-15 Thread Justin T Pryzby
You could check that the check intervals show up right in
objects.cache.

You could also try stopping nagios (check with ps that you don't have
multiple daemons running), removing the generated files and restarting
(note that this will cause notifications to be sent from scratch; you
may want to disable them first).

/var/cache/nagios3/
objects.cache  status.dat

/var/lib/nagios3/
retention.dat

On Tue, Jan 15, 2013 at 05:51:35PM +, Andrew Thompson wrote:
 Hi,
 
 I have had this problem previously and posted here but not go nowhere with it.
 
 Ill have another bash.
 
 Basically my nagios machine is checking too frequently and firing out alerts 
 too quickly
 
 Its ignoring the retry_interval value, the max_check_attempts value and 
 ingoring the notification_interval  value in the escalations.
 
 I have check interval of 5 minutes in OK state
 Retry interval of 3 minutes when in problem state
 Notification interval of 3 minutes
 
 I believe that below is the problem and multiple service checks in problem 
 state at the same time is casuing this.
 
 
 Ive just seen this on 1 of my hosts:
 
 It appears its accumulating the service checks (even though they are 
 different checks) into a final HARD state.
 
 Prior to 17:18 all was fine on this host!!!
 
 
 Then at 17:18 a SQL check went to warning state and to SOFT 1
 
 Checked again at 17:21 which is the 3 minute interval I have told it too when 
 in problem and its still warning so onto SOFT2
 
 Then a different service check on that host goes critical - but for the first 
 time
 
 17:22 memory usage and it puts this to HARD 3 - even though this actual check 
 for memory should be SOFT1
 
 An alert then got sent straight out for the Memory check even though it was 
 actually only check 1/3 on that particular service
 
 Here is the copy and past from the History of the host
 
 [01-15-2013 17:18:24]
 SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;1;WARNING - 2.3067 lock 
 timeouts / sec for _Total, 2.0667 lock timeouts / sec for Key, 0. lock 
 timeouts / sec for RID, 0.2400 lock timeouts / sec for Page, 0. lock 
 timeouts / sec for Object, 0. lock timeouts / sec for Metadata, 0. 
 lock timeouts / sec for HoBT, 0. lock timeouts / sec for File, 0. 
 lock timeouts / sec for Extent, 0. lock timeouts / sec for Database, 
 0. lock timeouts / sec for Application, 0. lock timeouts / sec for 
 AllocUnit
 [01-15-2013 17:21:24]
 SERVICE ALERT: SERVER;SQL LOCK TIMEOUTS;WARNING;SOFT;2;WARNING - 1.3056 lock 
 timeouts / sec for _Total, 1.1833 lock timeouts / sec for Key, 0. lock 
 timeouts / sec for RID, 0.1222 lock timeouts / sec for Page, 0. lock 
 timeouts / sec for Object, 0. lock timeouts / sec for Metadata, 0. 
 lock timeouts / sec for HoBT, 0. lock timeouts / sec for File, 0. 
 lock timeouts / sec for Extent, 0. lock timeouts / sec for Database, 
 0. lock timeouts / sec for Application, 0. lock timeouts / sec for 
 AllocUnit
 
 [01-15-2013 17:22:04]
 SERVICE ALERT: SERVER;MEMORY USAGE;CRITICAL;HARD;3;CRITICAL: physical memory: 
 Total: 10G - Used: 9.81G (98%) - Free: 192M (2%)  critical
 
 
 
 Does anybody please have any idea why my server is checking too frequently 
 and alerting too frequently and why its totting up different service checks?
 
 This machine has done nothing but not work right since it was loaded a couple 
 months ago.
 Im using the come config files on it as I did on the previous box I had - 
 only difference was that was running 3.3.1 - I had none of these problems on 
 that install.
 
 
 This is a Nagios 3.4.1 install on a Ubuntu 12.04 desktop 32 bit OS

--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null