On Jul 25, 2008, at 10:12 AM, John Oliver wrote: > On Thu, Jul 24, 2008 at 11:12:55PM -0500, Marc Powell wrote: >>
> I just checked nagios.cfg and: > > interval_length=1 All your intervals are in seconds then. The default is 60. >>> thought I had the errors fixed... the last email I got said >>> RECOVERED >>> (even though I should be getting CRITICAL alerts, as there is 1% >>> disk >>> space left). I changed the notification_interval, and never saw >>> another >>> email. >> >> Does the web interface show the status as CRITICAL? If you received a >> recovery notification the service was considered to be OK. What did >> you fix? > > No. The web interface is really confusing for this server: > > ftp UP N/A 486d 17h 50m 1s > > It has not been up for 486 days. And this is the one device that has You should verify your command{} definition for whatever the UP check is. That's a check that you or your predecessor created and not a 'standard' check. If it's not been UP for 486 days then it seems you're not checking what you think you're checking. > N/A for last check. It's green and "UP". But that doesn't change the > fact that nrpe reports 1% of disk space left, and that the nagios > server > can see that at least when I manually run the command. Correct, they'd be completely unrelated. > I'm starting to read about is_volatile, but I'm not really > understanding > it. One example is "things that automatically reset themselves to an > "OK" state each time they are checked" That certainly isn't the case > with a disk space check. Correct. Most services are not volatile. An example would be an SNMP trap. For every trap you receive, you want to send a notification regardless of the status of the previous trap. A volatile service sends out a notification for *every* non-OK check result for that service. > command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p > /dev/mapper/VolGroup00-LogVol00 Warn if less than 20MB are free, Critical if less than 10MB are free -- that common mistake I referenced. > [EMAIL PROTECTED] ~]# su nagios -c "/usr/lib/nagios/plugins/check_nrpe -H > ftp -c check_disk" > DISK OK - free space: / 782 MB (0% inode=99%);| > /=134653MB;142786;142796;0;142806 It's OK according to the criteria you've defined; you've got another 762M to go before warning ;-). 'check_disk --help' might be a good read. You want to add a '%' to those numbers. >> It seems to me you're not receiving notifications because hard state >> changes are not occurring. This is generally desired behavior. > > That doesn't really make sense to me. I won't be alerted until the > problem is fixed? Or gets worse? You'll be alerted when the service changes state by default. OK -> Warning, OK -> Critical, Warning -> Critical, Warning -> OK, Critical - > OK. With a notification interval of 180, you should be re-notified every 180 seconds _but_ only if the service is in a non-OK state. You're not in a non-OK state so your next notification will be when a state change occurs to Warning or Critical. http://nagios.sourceforge.net/docs/3_0/notifications.html > Here's what I'd like to wind up with... if available disk space drops > below a certain point, I'd like to have an alert go out maybe once per > day. If it drops past another point, into critical territory, I'd > like You should have enough information to fix the disk check now. For the notifications, adjust notification_interval to be 86400 (1 day in seconds). > alerts to be sent out more frequently. But, whatever the interval is, This is not possible AFAIK. notification_interval is the same, always. Having a shorter notification_interval and looking at Escalations might be a solution. Another would be to include that kind of logic in your notification script. > nagios should be alerting each time it sees low disk space. If it Every check? If that's what you want then setting is_volatile would do it. > alerts once, and then assumes that it never has to alert again unless > the problem gets fixed and then reappears, it's never going to get > fixed. Once I have alerting working this way, I'll point the emails > at That sounds like a people issue ;) Normally, that's the behavior but Escalations can help force the people issue. -- Marc ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null