RE: [Nagios-users] retention issue
I havent figured out what broke. But I have come up with a fix for retaining during restarts. I edited the init script, and added stop) + cp $NagiosStatusFile $NagiosRetentionFile Now the status.dat file is copied to the retention file. -Lori From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lori Adams Sent: Wednesday, January 18, 2006 11:57 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] retention issue Im using nagios-2.04b. Heres what happened. We had nagios running using nagios-1.2. We wanted to try out nagios-2.04b. I set up an entire new nagios instance on the same machine. We called this nagios instance, nagios-2. Ive moved nagios-2 to be in the same location as nagios(-1), the webserver etc. Everything is running, I can access the cgis, etc. All values have been updated to reflect the move. But somewhere along the line retention got extremely screwy. Here were my values: retain_state_information=1 state_retention_file=keeping path private/nagios-2.04/var/retention.dat retention_update_interval=60 use_retained_program_state=1 I had moved nagios-2 to be nagios on 1/11. Anytime I restart (/etc/init.d/nagios restart), the state values return to the values on the 11, with last check times set to 1/11. I have updated the retention_update_interval to 2. The retention.dat file is not updating while nagios is running. It is also not updated when nagios shuts down, seen by both the code in the init script and the modification time of the retention file, even though the comments say it will: So I decided to then remove the retention.dat file. This worked great as the status no longer says 1/11 for the last check. Except that no data is being retained. So anytime I do a restart, all statuses go back to pending. Does anyone have any thoughts? -Lori
Re: [Nagios-users] retention issue
Works here too. Looks like this needs to be punted to the dev guys for a bug fix. -mike On Jan 19, 2006, at 10:58 AM, Lori Adams wrote: I haven’t figured out what broke. But I have come up with a fix for retaining during restarts. I edited the init script, and added stop) +cp $NagiosStatusFile $NagiosRetentionFile Now the status.dat file is copied to the retention file. -Lori From: [EMAIL PROTECTED] [mailto:nagios-users- [EMAIL PROTECTED] On Behalf Of Lori Adams Sent: Wednesday, January 18, 2006 11:57 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] retention issue I’m using nagios-2.04b. Here’s what happened. We had nagios running using nagios-1.2. We wanted to try out nagios-2.04b. I set up an entire new nagios instance on the same machine. We called this nagios instance, nagios-2. I’ve moved nagios-2 to be in the same location as nagios (-1), the webserver etc. Everything is running, I can access the cgis, etc. All values have been updated to reflect the move. But somewhere along the line retention got extremely screwy. Here were my values: retain_state_information=1 state_retention_file=keeping path private/nagios-2.04/var/ retention.dat retention_update_interval=60 use_retained_program_state=1 I had moved nagios-2 to be nagios on 1/11. Anytime I restart (/etc/ init.d/nagios restart), the state values return to the values on the 11, with last check times set to 1/11. I have updated the retention_update_interval to 2. The retention.dat file is not updating while nagios is running. It is also not updated when nagios shuts down, seen by both the code in the init script and the modification time of the retention file, even though the comments say it will: So I decided to then remove the retention.dat file. This worked great as the status no longer says 1/11 for the last check. Except that no data is being retained. So anytime I do a restart, all statuses go back to pending. Does anyone have any thoughts? -Lori --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] retention issue
hello please note that the retention_update_interval=60 are in minutes. if you during this time restart nagios the retention state will not be saved. i would suggest to lower this to 5 minutes and to check if you still have this problem.also ifyou enable retention in the main config file you do not need to specify it per service (only if you want to disable retention for specfic service) Moshe
RE: [Nagios-users] retention issue
This is related to your max_check_attempts setting. If the service hasn't reached the max_checks yet, it's still 'soft' state. Once it hits the max_checks, it'll be hard state (and will get retained between restarts) -Original Message- From: Lori Adams [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 10:36 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] retention issue Nagios 1.2 Linux I'm using a couple of templates for this particular check. There are many services checks using this template. When one of these checks becomes critical, the status in status.log changes to say it's critical. If I stop/start nagios, then the status saved in status.sav is incorrect, and says No data yet (service was in a soft problem state during state retention). Here are the templates, before everyone tells me to turn on state retention: define service{ namegeneric-service-template ... retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information1 ; Retain non-status information across program restarts ... register0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } define service { use generic-service-template nameserver-template host_name server contact_groups admins register0 } define service { use server-template nameserver-spool-template normal_check_interval 60 retry_check_interval30 check_periodworkhours_with_weekend register0 } define service { use server-spool-template service_description check check_command check_spool_nrpe!-d /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C 2000 } From nagios.cfg: retain_state_information=1 retention_update_interval=60 use_retained_program_state=1 I ran these commands all immediately one after the other, to show what is happening. [EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod stop; date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep check status.log Fri Nov 18 10:23:24 PST 2005 [1132338202] SERVICE;server;spool-check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1; 1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/Maild ir/check last modified 11/14/05 16:49:00 Stopping network monitor: nagios Fri Nov 18 10:23:24 PST 2005 Starting network monitor: nagios 21897 ?00:00:00 nagios-prod Fri Nov 18 10:23:26 PST 2005 [1132338205] SERVICE;server;spool-check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet (service was in a soft problem state during state retention) This is only happening when the checks using server-spool-template are in a critical state. Thanks, -Lori --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
RE: [Nagios-users] retention issue
Are you saying that soft states are not retained? Is this in the docs? Everything I read says, status/states are retained. Soft states are just as important as hard states, in my opinion. -Lori -Original Message- From: Tedman Eng [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 11:25 AM To: Lori Adams; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] retention issue This is related to your max_check_attempts setting. If the service hasn't reached the max_checks yet, it's still 'soft' state. Once it hits the max_checks, it'll be hard state (and will get retained between restarts) -Original Message- From: Lori Adams [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 10:36 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] retention issue Nagios 1.2 Linux I'm using a couple of templates for this particular check. There are many services checks using this template. When one of these checks becomes critical, the status in status.log changes to say it's critical. If I stop/start nagios, then the status saved in status.sav is incorrect, and says No data yet (service was in a soft problem state during state retention). Here are the templates, before everyone tells me to turn on state retention: define service{ namegeneric-service-template ... retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information1 ; Retain non-status information across program restarts ... register0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } define service { use generic-service-template nameserver-template host_name server contact_groups admins register0 } define service { use server-template nameserver-spool-template normal_check_interval 60 retry_check_interval30 check_periodworkhours_with_weekend register0 } define service { use server-spool-template service_description check check_command check_spool_nrpe!-d /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C 2000 } From nagios.cfg: retain_state_information=1 retention_update_interval=60 use_retained_program_state=1 I ran these commands all immediately one after the other, to show what is happening. [EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod stop; date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep check status.log Fri Nov 18 10:23:24 PST 2005 [1132338202] SERVICE;server;spool- check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1; 1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/M ai ld ir/check last modified 11/14/05 16:49:00 Stopping network monitor: nagios Fri Nov 18 10:23:24 PST 2005 Starting network monitor: nagios 21897 ?00:00:00 nagios-prod Fri Nov 18 10:23:26 PST 2005 [1132338205] SERVICE;server;spool- check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet (service was in a soft problem state during state retention) This is only happening when the checks using server-spool-template are in a critical state. Thanks, -Lori --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_idv28alloc_id845op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
RE: [Nagios-users] retention issue
I don't think current check attempt # is retained Since soft states are not considered 'real' errors yet, when a nagios restart occurs it must count up from the beginning again. This is my understanding, though only from personal experience, not from docs I've read somewhere. I agree it would be useful to retain soft states as well. (maybe changed in 2.0, I haven't migrated yet so I don't know) -Original Message- From: Lori Adams [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 11:41 AM To: Tedman Eng; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] retention issue Are you saying that soft states are not retained? Is this in the docs? Everything I read says, status/states are retained. Soft states are just as important as hard states, in my opinion. -Lori -Original Message- From: Tedman Eng [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 11:25 AM To: Lori Adams; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] retention issue This is related to your max_check_attempts setting. If the service hasn't reached the max_checks yet, it's still 'soft' state. Once it hits the max_checks, it'll be hard state (and will get retained between restarts) -Original Message- From: Lori Adams [mailto:[EMAIL PROTECTED] Sent: Friday, November 18, 2005 10:36 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] retention issue Nagios 1.2 Linux I'm using a couple of templates for this particular check. There are many services checks using this template. When one of these checks becomes critical, the status in status.log changes to say it's critical. If I stop/start nagios, then the status saved in status.sav is incorrect, and says No data yet (service was in a soft problem state during state retention). Here are the templates, before everyone tells me to turn on state retention: define service{ namegeneric-service-template ... retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information1 ; Retain non-status information across program restarts ... register0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } define service { use generic-service-template nameserver-template host_name server contact_groups admins register0 } define service { use server-template nameserver-spool-template normal_check_interval 60 retry_check_interval30 check_periodworkhours_with_weekend register0 } define service { use server-spool-template service_description check check_command check_spool_nrpe!-d /srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 1000 -C 2000 } From nagios.cfg: retain_state_information=1 retention_update_interval=60 use_retained_program_state=1 I ran these commands all immediately one after the other, to show what is happening. [EMAIL PROTECTED](var)# date; grep check status.log; /etc/init.d/nagios-prod stop; date; grep check status.sav; /etc/init.d/nagios-prod start; date; grep check status.log Fri Nov 18 10:23:24 PST 2005 [1132338202] SERVICE;server;spool- check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIVE;1; 1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/smtp/M ai ld ir/check last modified 11/14/05 16:49:00 Stopping network monitor: nagios Fri Nov 18 10:23:24 PST 2005 Starting network monitor: nagios 21897 ?00:00:00 nagios-prod Fri Nov 18 10:23:26 PST 2005 [1132338205] SERVICE;server;spool- check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;1;11 32338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet (service was in a soft problem state during state retention) This is only happening when the checks using server-spool-template are in a critical state. Thanks, -Lori --- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=7628alloc_id=16845op=click ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting