this patch fixes "alertafter" and "numalerts" for "traptimeouts". it is in reply to the two mails at the bottom.
the two changes haven't seemed to break anything else, but just in case here are the two changes in english: 1. in "&handle_trap_timeout", $sref->{"_consec_failures"}++ gets the "alertafter NUM" to work . 2. "&call_alert" doesn't send the alert if we pass it "undef" $output or $retval, so i substituted reasonable values. now the following woks, where before no alert would be sent if the heartbeat stopped. watch remote-group service heartbeat traptimeout 10s period wd {Sun-Sat} alert test.alert tscanlan upalert test.alert -u tscanlan alertafter 2 numalerts 3 -Tom Scanlan OpenReach, Inc. Network Operations office: 732-254-0210 x-6022 cell: 732-682-3365 ---- RFP: ----------------------------------------------------------------------------- Date: Tue, 13 Nov 2001 14:54:22 +0100 From: "Peter Wirdemo (EMW)" <[EMAIL PROTECTED]> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> Subject: trap timeout alerts Hello! I'm trying to use mon, to do a heartbeat style monitoring. Why dont i get any alerts when the trap is timed out. In the mon.cgi i get: Host Group | Service ------------------------------------ syslog | hearbeat : trap timeout | (FAILED,NOALERTS) NOALERTS?????? Mon Version: $Id: mon 1.27 Sat, 08 Sep 2001 09:42:05 -0400 trockij $ $ProjectVersion: mon-0-99-2.6 $ Config: watch syslog service heartbeat description heartbeat test traptimeout 30s trapduration 1s period wd {Sun-Sat} alertevery 1h no_comp_alerts alert mail.alert me@localhost upalert mail.alert -u me@localhost Thanks /Peter ----------------------------------------------------------------------------- Date: Wed, 30 Jan 2002 12:53:46 -0500 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: alertevery does not work with traps I'm having problems getting the alertevery variable to work with traps. I've seen in this mailing list where others have reported that consecutive failures do not appear to get incremented withing the trap handling sub routine (have not yet looked at code myself). However I have not seen any mention of alertevery not working in this scenario. The alertafter XXm variable seems to work fine, however people are getting paged every time a failure occurs and I desperately need to throttle this back. Relevant portion of my config.... watch trap-webchat service webchat-useragent period FIRSTLEVEL: wd {Sun-Sat} alert audible.alert alertafter 6m period SECONDLEVEL: wd {Sun-Sat} alert bcmail.alert analyst alertafter 15m alertevery 10m period THIRDLEVEL: wd {Sun-Sat} alert bcmail.alert expert alertafter 30m alertevery 10m period CRISIS: wd {Sun-Sat} alert bcmail.alert crisis_team alertafter 30m numalerts 1 period FOURTHLEVEL: wd {Sun-Sat} alert bcmail.alert management alertafter 50m alertevery 10m Has anyone successfully gotten traps/alertevery working?
--- mon Mon Feb 25 17:03:21 2002 +++ mon.tom Mon Feb 25 17:15:34 2002 @@ -3975,6 +3975,7 @@ my $sref = \%{$watch{$group}->{$service}}; $sref->{"_failure_count"}++; + $sref->{"_consec_failures"}++; $sref->{"_last_failure"} = $tmnow; $sref->{"_first_failure"} = $tmnow if ($sref->{"_op_status"} != $STAT_FAIL); set_op_status ($group, $service, $STAT_FAIL); @@ -3984,7 +3985,7 @@ push @last_failures, "$group $service $tm $sref->{_last_summary}"; syslog ('crit', "failure for $last_failures[-1]"); - do_alert ($group, $service, undef, undef, $FL_TRAPTIMEOUT); + do_alert ($group, $service, "NO OUTPUT", 1, $FL_TRAPTIMEOUT); }