RE: [Nagios-users] retention issue

2006-01-19 Thread Lori Adams








I havent figured out what
broke. But I have come up with a fix for retaining during restarts.



I edited the init script, and added 

stop)

+
cp $NagiosStatusFile $NagiosRetentionFile



Now the status.dat file is copied to the
retention file.



-Lori













From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Lori Adams
Sent: Wednesday, January 18, 2006
11:57 AM
To:
nagios-users@lists.sourceforge.net
Subject: [Nagios-users] retention
issue





Im using nagios-2.04b.



Heres what happened. We had nagios running
using nagios-1.2. We wanted to try out nagios-2.04b. I set up an
entire new nagios instance on the same machine. We called this nagios
instance, nagios-2. Ive moved nagios-2 to be in the same location
as nagios(-1), the webserver etc. Everything is running, I can access the
cgis, etc. All values have been updated to reflect the move.



But somewhere along the line retention got extremely
screwy. Here were my values:

retain_state_information=1

state_retention_file=keeping path
private/nagios-2.04/var/retention.dat

retention_update_interval=60

use_retained_program_state=1



I had moved nagios-2 to be nagios on 1/11. Anytime I
restart (/etc/init.d/nagios restart), the state values return to the values on
the 11, with last check times set to 1/11.



I have updated the retention_update_interval to 2. The
retention.dat file is not updating while nagios is running. It is also not
updated when nagios shuts down, seen by both the code in the init script and
the modification time of the retention file, even though the comments say it
will:



So I decided to then remove the retention.dat file.
This worked great as the status no longer says 1/11 for the last check.
Except that no data is being retained. So anytime I do a restart, all
statuses go back to pending.



Does anyone have any thoughts?



-Lori










Re: [Nagios-users] retention issue

2006-01-19 Thread Mike Holloway


Works here too.  Looks like this needs to be punted to the dev guys  
for a bug fix.


-mike


On Jan 19, 2006, at 10:58 AM, Lori Adams wrote:

I haven’t figured out what broke.  But I have come up with a fix  
for retaining during restarts.




I edited the init script, and added

stop)

+cp $NagiosStatusFile $NagiosRetentionFile



Now the status.dat file is copied to the retention file.



-Lori



From: [EMAIL PROTECTED] [mailto:nagios-users- 
[EMAIL PROTECTED] On Behalf Of Lori Adams

Sent: Wednesday, January 18, 2006 11:57 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] retention issue



I’m using nagios-2.04b.



Here’s what happened.  We had nagios running using nagios-1.2.  We  
wanted to try out nagios-2.04b.  I set up an entire new nagios  
instance on the same machine.  We called this nagios instance,  
nagios-2.  I’ve moved nagios-2 to be in the same location as nagios 
(-1), the webserver etc.  Everything is running, I can access the  
cgis, etc.  All values have been updated to reflect the move.




But somewhere along the line retention got extremely screwy.  Here  
were my values:


retain_state_information=1

state_retention_file=keeping path private/nagios-2.04/var/ 
retention.dat


retention_update_interval=60

use_retained_program_state=1



I had moved nagios-2 to be nagios on 1/11.  Anytime I restart (/etc/ 
init.d/nagios restart), the state values return to the values on  
the 11, with last check times set to 1/11.




I have updated the retention_update_interval to 2.  The  
retention.dat file is not updating while nagios is running.  It is  
also not updated when nagios shuts down, seen by both the code in  
the init script and the modification time of the retention file,  
even though the comments say it will:




So I decided to then remove the retention.dat file.  This worked  
great as the status no longer says 1/11 for the last check.  Except  
that no data is being retained.  So anytime I do a restart, all  
statuses go back to pending.




Does anyone have any thoughts?



-Lori





---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] retention issue

2005-11-20 Thread moshe sharon
hello

please note that the retention_update_interval=60 are in minutes. if you during this time restart nagios
the retention state will not be saved. i would suggest to lower this to 5 minutes and to check if you still have this problem.also ifyou enable retention in the main config file you do not need to specify it per service (only if you want to disable retention for specfic service)



Moshe





RE: [Nagios-users] retention issue

2005-11-18 Thread Tedman Eng
This is related to your max_check_attempts setting.

If the service hasn't reached the max_checks yet, it's still 'soft' state.
Once it hits the max_checks, it'll be hard state (and will get retained
between restarts)



-Original Message-
From: Lori Adams [mailto:[EMAIL PROTECTED]
Sent: Friday, November 18, 2005 10:36 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] retention issue


Nagios 1.2
Linux
 
I'm using a couple of templates for this particular check.  There are many
services checks using this template.  When one of these checks becomes
critical, the status in status.log changes to say it's critical.  If I
stop/start nagios, then the status saved in status.sav is incorrect, and
says No data yet (service was in a soft problem state during state
retention).
 
Here are the templates, before everyone tells me to turn on state retention:
define service{
namegeneric-service-template
...
retain_status_information   1   ; Retain status information
across program restarts
retain_nonstatus_information1   ; Retain non-status
information across program restarts
...
register0   ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
 
define service {
use generic-service-template
nameserver-template
host_name   server
contact_groups  admins
register0
}
 
define service {
use server-template
nameserver-spool-template
normal_check_interval   60
retry_check_interval30
check_periodworkhours_with_weekend
register0
}
 
define service {
use server-spool-template
service_description check
check_command   check_spool_nrpe!-d
/srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C
2000
}
 
From nagios.cfg:
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1
 
I ran these commands all immediately one after the other, to show what is
happening.
 
[EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod 
stop;
date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep check
status.log
Fri Nov 18 10:23:24 PST 2005
[1132338202]
SERVICE;server;spool-check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1;
1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/Maild
ir/check last modified 11/14/05 16:49:00
 
Stopping network monitor: nagios
Fri Nov 18 10:23:24 PST 2005
Starting network monitor: nagios
21897 ?00:00:00 nagios-prod
 
Fri Nov 18 10:23:26 PST 2005
[1132338205]
SERVICE;server;spool-check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11
32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet (service
was in a soft problem state during state retention)
 
This is only happening when the checks using server-spool-template are in a
critical state.
 
Thanks,
-Lori


---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


RE: [Nagios-users] retention issue

2005-11-18 Thread Lori Adams
Are you saying that soft states are not retained?  Is this in the docs?
Everything I read says, status/states are retained.  

Soft states are just as important as hard states, in my opinion.

-Lori

 -Original Message-
 From: Tedman Eng [mailto:[EMAIL PROTECTED]
 Sent: Friday, November 18, 2005 11:25 AM
 To: Lori Adams; nagios-users@lists.sourceforge.net
 Subject: RE: [Nagios-users] retention issue
 
 This is related to your max_check_attempts setting.
 
 If the service hasn't reached the max_checks yet, it's still 'soft'
state.
 Once it hits the max_checks, it'll be hard state (and will get
retained
 between restarts)
 
 
 
 -Original Message-
 From: Lori Adams [mailto:[EMAIL PROTECTED]
 Sent: Friday, November 18, 2005 10:36 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] retention issue
 
 
 Nagios 1.2
 Linux
 
 I'm using a couple of templates for this particular check.  There are
many
 services checks using this template.  When one of these checks becomes
 critical, the status in status.log changes to say it's critical.  If I
 stop/start nagios, then the status saved in status.sav is incorrect,
and
 says No data yet (service was in a soft problem state during state
 retention).
 
 Here are the templates, before everyone tells me to turn on state
 retention:
 define service{
 namegeneric-service-template
 ...
 retain_status_information   1   ; Retain status
 information
 across program restarts
 retain_nonstatus_information1   ; Retain non-status
 information across program restarts
 ...
 register0   ; DONT REGISTER THIS
 DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
 }
 
 define service {
 use generic-service-template
 nameserver-template
 host_name   server
 contact_groups  admins
 register0
 }
 
 define service {
 use server-template
 nameserver-spool-template
 normal_check_interval   60
 retry_check_interval30
 check_periodworkhours_with_weekend
 register0
 }
 
 define service {
 use server-spool-template
 service_description check
 check_command   check_spool_nrpe!-d
 /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C
 2000
 }
 
 From nagios.cfg:
 retain_state_information=1
 retention_update_interval=60
 use_retained_program_state=1
 
 I ran these commands all immediately one after the other, to show what
is
 happening.
 
 [EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod
 stop;
 date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep
 check
 status.log
 Fri Nov 18 10:23:24 PST 2005
 [1132338202]
 SERVICE;server;spool-
 check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1;

1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/M
ai
 ld
 ir/check last modified 11/14/05 16:49:00
 
 Stopping network monitor: nagios
 Fri Nov 18 10:23:24 PST 2005
 Starting network monitor: nagios
 21897 ?00:00:00 nagios-prod
 
 Fri Nov 18 10:23:26 PST 2005
 [1132338205]
 SERVICE;server;spool-
 check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11
 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet
 (service
 was in a soft problem state during state retention)
 
 This is only happening when the checks using server-spool-template are
in
 a
 critical state.
 
 Thanks,
 -Lori


---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_idv28alloc_id845op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null


RE: [Nagios-users] retention issue

2005-11-18 Thread Tedman Eng
I don't think current check attempt # is retained

Since soft states are not considered 'real' errors yet, when a nagios
restart occurs it must count up from the beginning again.

This is my understanding, though only from personal experience, not from
docs I've read somewhere.  I agree it would be useful to retain soft states
as well.

(maybe changed in 2.0, I haven't migrated yet so I don't know)


-Original Message-
From: Lori Adams [mailto:[EMAIL PROTECTED]
Sent: Friday, November 18, 2005 11:41 AM
To: Tedman Eng; nagios-users@lists.sourceforge.net
Subject: RE: [Nagios-users] retention issue


Are you saying that soft states are not retained?  Is this in the docs?
Everything I read says, status/states are retained.  

Soft states are just as important as hard states, in my opinion.

-Lori

 -Original Message-
 From: Tedman Eng [mailto:[EMAIL PROTECTED]
 Sent: Friday, November 18, 2005 11:25 AM
 To: Lori Adams; nagios-users@lists.sourceforge.net
 Subject: RE: [Nagios-users] retention issue
 
 This is related to your max_check_attempts setting.
 
 If the service hasn't reached the max_checks yet, it's still 'soft'
state.
 Once it hits the max_checks, it'll be hard state (and will get
retained
 between restarts)
 
 
 
 -Original Message-
 From: Lori Adams [mailto:[EMAIL PROTECTED]
 Sent: Friday, November 18, 2005 10:36 AM
 To: nagios-users@lists.sourceforge.net
 Subject: [Nagios-users] retention issue
 
 
 Nagios 1.2
 Linux
 
 I'm using a couple of templates for this particular check.  There are
many
 services checks using this template.  When one of these checks becomes
 critical, the status in status.log changes to say it's critical.  If I
 stop/start nagios, then the status saved in status.sav is incorrect,
and
 says No data yet (service was in a soft problem state during state
 retention).
 
 Here are the templates, before everyone tells me to turn on state
 retention:
 define service{
 namegeneric-service-template
 ...
 retain_status_information   1   ; Retain status
 information
 across program restarts
 retain_nonstatus_information1   ; Retain non-status
 information across program restarts
 ...
 register0   ; DONT REGISTER THIS
 DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
 }
 
 define service {
 use generic-service-template
 nameserver-template
 host_name   server
 contact_groups  admins
 register0
 }
 
 define service {
 use server-template
 nameserver-spool-template
 normal_check_interval   60
 retry_check_interval30
 check_periodworkhours_with_weekend
 register0
 }
 
 define service {
 use server-spool-template
 service_description check
 check_command   check_spool_nrpe!-d
 /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C
 2000
 }
 
 From nagios.cfg:
 retain_state_information=1
 retention_update_interval=60
 use_retained_program_state=1
 
 I ran these commands all immediately one after the other, to show what
is
 happening.
 
 [EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod
 stop;
 date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep
 check
 status.log
 Fri Nov 18 10:23:24 PST 2005
 [1132338202]
 SERVICE;server;spool-
 check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1;

1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/M
ai
 ld
 ir/check last modified 11/14/05 16:49:00
 
 Stopping network monitor: nagios
 Fri Nov 18 10:23:24 PST 2005
 Starting network monitor: nagios
 21897 ?00:00:00 nagios-prod
 
 Fri Nov 18 10:23:26 PST 2005
 [1132338205]
 SERVICE;server;spool-
 check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11
 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet
 (service
 was in a soft problem state during state retention)
 
 This is only happening when the checks using server-spool-template are
in
 a
 critical state.
 
 Thanks,
 -Lori


---
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting