Jim,

Here's the patch of all my current changes.

This is relative to the patch I sent a month ago.

I'm also including a patch to the mon manpage to
document the new features I've added.  (Including documenting the new
authentication type 'trustlocal' I added in the last patch.)

Let me know if you have any questions/comments.

I'm going to start working on the per-host status tracking in the next
week or so, but I thought you might want to try to get 0.99.3 out the door
before you start looking at integrating the major rewrite I'll have to do
for that.

-David Nolan
 Network Software Developer
 Computing Services
 Carnegie Mellon University
Index: mon.8
===================================================================
RCS file: /afs/andrew/system/cvs/src/netsage/mon/doc/mon.8,v
retrieving revision 1.1
retrieving revision 1.2
diff -c -r1.1 -r1.2
*** mon.8       2002/09/26 12:33:11     1.1
--- mon.8       2002/09/26 12:37:12     1.2
***************
*** 22,27 ****
--- 22,29 ----
  .RB [ \-k
  .IR num ]
  .RB [ \-l
+ .IR [ statetype ] ]
+ .RB [ \-L
  .IR dir ]
  .RB [ \-m
  .IR num ]
***************
*** 105,113 ****
  entries. Defaults
  to 100.
  .TP
! .BI \-l
! Load state from the last saved state file. Currently the only
! supported saved state is disabled watches, services, and hosts.
  .TP
  .BI \-L\ dir
  Sets the log dir. See also
--- 107,125 ----
  entries. Defaults
  to 100.
  .TP
! .BI \-l\ statetype
! Load state from the last saved state file. The 
! supported saved state types are 
! .B disabled
! for disabled watches, services, and hosts, 
! .B opstatus
! for failure/alert/ack status of 
! all services,
! and 
! .B all 
! for both.  If no statetype is provided, 
! .B disabled
! is assumed.
  .TP
  .BI \-L\ dir
  Sets the log dir. See also
***************
*** 346,352 ****
  .B dep_behavior
  is set to
  .IR "'a'" ,
! and a parent dependency is failing, then suppress the alert.
  If the alert has previously been acknowledged, do not send
  the alert, unless it is an upalert.
  If an alert is not within the specified period, record the failure
--- 358,366 ----
  .B dep_behavior
  is set to
  .IR "'a'" ,
! or
! .B alertdepend
! is set, and a parent dependency is failing, then suppress the alert.
  If the alert has previously been acknowledged, do not send
  the alert, unless it is an upalert.
  If an alert is not within the specified period, record the failure
***************
*** 631,636 ****
--- 645,659 ----
  .B passwd
  service will be used.
  
+ If
+ .I type
+ is
+ .BR trustlocal ,
+ then if the client connection comes from locahost, the username passed from 
+ the client will be trusted, and the password will be ignored.  This can be used 
+ when you want the client to handle the authentication for you.  I.e. a CGI script 
+ using one of the many apache authentication methods.
+ 
  .TP
  .BI "userfile = " file
  This file is used when
***************
*** 817,829 ****
  The default limit is 10.
  
  .TP
! .BI "dep_behavior = " {a|m}
  .B dep_behavior
  controls whether the dependency expression
! suppresses either the running of alerts or monitors
! when a node in the dependency graph fails. Read more
! about the behavior in the "Service Definitions" section
! below.
  
  This is a global setting which controls the default
  settings for the service-specified variable.
--- 840,852 ----
  The default limit is 10.
  
  .TP
! .BI "dep_behavior = " {a|m|hm}
  .B dep_behavior
  controls whether the dependency expression
! suppresses one of: the running of alerts, the running of 
! monitors, or the passing of individual hosts to the monitors.
! Read more about the behavior in the "Service Definitions" 
! section below.
  
  This is a global setting which controls the default
  settings for the service-specified variable.
***************
*** 1023,1032 ****
  the machine being ping-reachable.
  
  .TP
! .BI dep_behavior " {a|m}"
! The evaluation of dependency graphs
  can control the
! suppression of either alert or monitor invocations.
  
  .BR "Alert suppression" .
  If this option is set to "a",
--- 1046,1058 ----
  the machine being ping-reachable.
  
  .TP
! .BI dep_behavior " {a|m|hm}"
! The evaluation of the dependency graphs specified via the
! .B depend
! keyword
  can control the
! suppression of alert or monitor invocations, or the suppression
! of individual hosts passed to the monitor.
  
  .BR "Alert suppression" .
  If this option is set to "a",
***************
*** 1047,1052 ****
--- 1073,1111 ----
  will be run. Otherwise, the monitor will not
  be run and the status of the service will remain
  the same.
+ 
+ .BR "Host suppression" .
+ If it is set to "hm" then Mon will extract the list of "parent"
+ services from the dependency expression.  (In fact the expression can
+ be just a list of services.) Then when the monitor for the service is
+ about to be run, for each host in the current hostgroup Mon will
+ search all the parent services which are currently failing and look
+ for the hostname in the current summary output.  If the hostname is
+ found, this host will be excluded from this run of the monitor.  This
+ can be used to e.g. allow an SMTP test on a group of hosts to still be run
+ even when a single host is not ping-reachable.  If all the rest of the
+ hosts are working fine, the service will be in an OK state, but if
+ another host fails the SMTP test Mon can still alert about that host
+ even though the parent dependency was failing.  The dependency
+ expression will
+ .B not
+ be used recursively in this case.
+ 
+ .TP
+ .BI alertdepend " dependexpression"
+ .TP
+ .BI monitordepend " dependexpression"
+ .TP
+ .BI hostdepend " dependexpression"
+ These keywords allow you to specify multiple dependency expressions of 
+ different types.  Each one corresponds to the different 
+ .B dep_behavior
+ settings listed above.  They will be evaluated independently in the different
+ contexts as listed above.  If
+ .B depend
+ is present, it takes precedence over the matching keyword, depending on the
+ .B dep_behavior
+ setting.
  
  .SS "Period Definitions"
  
Index: mon
===================================================================
RCS file: /afs/andrew/system/cvs/src/netsage/mon/bin/mon,v
retrieving revision 1.9
diff -c -r1.9 mon
*** mon 2002/08/19 19:09:44     1.9
--- mon 2002/09/26 12:18:05
***************
*** 65,70 ****
--- 65,71 ----
  sub debug;
  sub debug_dir;
  sub dep_ok;
+ sub dep_summary;
  sub depend;
  sub dhmstos;
  sub die_die;
***************
*** 198,204 ****
  #
  # argument parsing
  #
! getopts ("fhlMSvda:A:b:B:c:D:i:L:m:O:o:p:P:r:s:t:", \%opt);
  
  #
  # these two things can be taken care of without
--- 199,205 ----
  #
  # argument parsing
  #
! getopts ("fhMSvda:A:b:B:c:D:i:l:L:m:O:o:p:P:r:s:t:", \%opt);
  
  #
  # these two things can be taken care of without
***************
*** 343,350 ****
  #
  # load previously saved state
  #
! load_state ("disabled") if ($opt{"l"});
  
  syslog ('info', "mon server started");
  
  #
--- 344,362 ----
  #
  # load previously saved state
  #
! if (exists $opt{"l"}) {
!     if ($opt{"l"}) {
!       # If -l was given an argument (all, disabled, opstatus, etc...)
!       # pass that to load_state
!       load_state($opt{"l"});
!     }else{
!       # Otherwise default to old behavior of just loading disabled 
hosts/services/groups
!       load_state("disabled");
!     }
! }
! 
  
+ 
  syslog ('info', "mon server started");
  
  #
***************
*** 369,375 ****
            #
            # skip over disabled watch
            #
!           next if ($watch_disabled{$group} == 1);
  
            foreach my $service (keys %{$watch{$group}}) {
  
--- 381,387 ----
            #
            # skip over disabled watch
            #
!           next if (exists $watch_disabled{$group} && $watch_disabled{$group} == 1);
  
            foreach my $service (keys %{$watch{$group}}) {
  
***************
*** 384,390 ****
                if ($sref->{"traptimeout"}) {
                    $sref->{"_trap_timer"} -= $t;
  
!                   if ($sref->{"_trap_timer"} <= 0 && $tm - $sref->{"_last_uptrap"} >
                                $sref->{"traptimeout"}) {
                        $sref->{"_trap_timer"} = $sref->{"traptimeout"};
                        handle_trap_timeout ($group, $service);
--- 396,402 ----
                if ($sref->{"traptimeout"}) {
                    $sref->{"_trap_timer"} -= $t;
  
!                   if ($sref->{"_trap_timer"} <= 0 && $tm - $sref->{"_last_trap"} >
                                $sref->{"traptimeout"}) {
                        $sref->{"_trap_timer"} = $sref->{"traptimeout"};
                        handle_trap_timeout ($group, $service);
***************
*** 411,426 ****
                {
                    if (!$CF{"MAXPROCS"} || $procs < $CF{"MAXPROCS"})
                    {
!                       if ($sref->{"exclude_period"} ne "" &&
!                               inPeriod (time, $sref->{"exclude_period"}))
                        {
                            debug (1, "not running $group,$service because of 
exclude_period\n");
                        }
  
!                       elsif ($sref->{"dep_behavior"} eq "m" &&
!                               $sref->{"depend"} ne "")
                        {
!                           if (dep_ok ($sref))
                            {
                                run_monitor ($group, $service);
                            }
--- 423,440 ----
                {
                    if (!$CF{"MAXPROCS"} || $procs < $CF{"MAXPROCS"})
                    {
!                       if (defined $sref->{"exclude_period"} 
!                           && $sref->{"exclude_period"} ne "" &&
!                           inPeriod (time, $sref->{"exclude_period"}))
                        {
                            debug (1, "not running $group,$service because of 
exclude_period\n");
                        }
  
!                       elsif (($sref->{"dep_behavior"} eq "m" &&
!                               defined $sref->{"depend"} && $sref->{"depend"} ne "")
!                              || (defined $sref->{"monitordepend"} && 
$sref->{"monitordepend"} ne "")) 
                        {
!                           if (dep_ok ($sref, 'm'))
                            {
                                run_monitor ($group, $service);
                            }
***************
*** 530,536 ****
      #
      # if the alarm is disabled, ignore it
      #
!     if ($sref->{"disable"} == 1)
      {
        syslog ("notice", "ignoring alert for $group,$service");
        return;
--- 544,550 ----
      #
      # if the alarm is disabled, ignore it
      #
!     if (defined $sref->{"disable"} && $sref->{"disable"} == 1)
      {
        syslog ("notice", "ignoring alert for $group,$service");
        return;
***************
*** 540,548 ****
      # dependency check
      #
      if (!($flags & $FL_STARTUPALERT) &&
!           !($flags & $FL_UPALERT) &&
!           defined $sref->{"depend"} &&
!           $sref->{"dep_behavior"} eq "a")
      {
        if (!$sref->{"_depend_status"})
        {
--- 554,562 ----
      # dependency check
      #
      if (!($flags & $FL_STARTUPALERT) &&
!       !($flags & $FL_UPALERT) &&
!       ((defined $sref->{"depend"} && $sref->{"dep_behavior"} eq "a")
!        || (defined $sref->{"alertdepend"})))
      {
        if (!$sref->{"_depend_status"})
        {
***************
*** 562,568 ****
      }
  
      my ($summary) = split("\n", $output);
!     $summary = "(NO SUMMARY)" if ($summary =~ /^\s*$/m);
  
      #
      # check each time period for pending alerts
--- 576,582 ----
      }
  
      my ($summary) = split("\n", $output);
!     $summary = "(NO SUMMARY)" if (!defined $summary || $summary =~ /^\s*$/m);
  
      #
      # check each time period for pending alerts
***************
*** 1002,1008 ****
                $new_CF{"DEP_RECUR_LIMIT"} = $2;
  
            } elsif ($1 eq "dep_behavior") {
!               if ($2 ne "m" && $2 ne "a") {
                    close (CFG);
                    return "cf error: unknown dependency behavior '$2', line 
$line_num";
                }
--- 1016,1022 ----
                $new_CF{"DEP_RECUR_LIMIT"} = $2;
  
            } elsif ($1 eq "dep_behavior") {
!               if ($2 ne "m" && $2 ne "a" && $2 ne "hm") {
                    close (CFG);
                    return "cf error: unknown dependency behavior '$2', line 
$line_num";
                }
***************
*** 1208,1213 ****
--- 1222,1228 ----
                $sref->{"_last_failure"} = 0 if (!defined($sref->{"_last_failure"}));
                $sref->{"_last_success"} = 0 if (!defined($sref->{"_last_success"}));
                $sref->{"_last_trap"} = 0 if (!defined($sref->{"_last_trap"}));
+               $sref->{"_last_traphost"} = '' if 
+(!defined($sref->{"_last_traphost"}));
                $sref->{"_exitval"} = "undef" if (!defined($sref->{"_exitval"}));
                $sref->{"_last_check"} = undef;
                $sref->{"_depend_status"} = undef;
***************
*** 1472,1478 ****
                
                elsif ($var eq "dep_behavior")
                {
!                   if ($args ne "m" && $args ne "a")
                    {
                        close (CFG);
                        return "cf error: unknown dependency behavior '$args' (syntax: 
dep_behavior = {m|a}), line $line_num";
--- 1487,1493 ----
                
                elsif ($var eq "dep_behavior")
                {
!                   if ($args ne "m" && $args ne "a" && $args ne "hm")
                    {
                        close (CFG);
                        return "cf error: unknown dependency behavior '$args' (syntax: 
dep_behavior = {m|a}), line $line_num";
***************
*** 1484,1489 ****
--- 1499,1519 ----
                    $args =~ s/SELF:/$watchgroup:/g;
                }
  
+               elsif ($var eq "alertdepend")
+               {
+                   $args =~ s/SELF:/$watchgroup:/g;
+               }
+ 
+               elsif ($var eq "monitordepend")
+               {
+                   $args =~ s/SELF:/$watchgroup:/g;
+               }
+ 
+               elsif ($var eq "hostdepend")
+               {
+                   $args =~ s/SELF:/$watchgroup:/g;
+               }
+ 
                elsif ($var eq "exclude_hosts")
                {
                    my $ex = {};
***************
*** 1594,1600 ****
      }
  
      $procs = 0;
! 
      syslog ('info', "resetting, and re-reading configuration $CF{CF}");
  
      if ((my $err = read_cf ($CF{"CF"}, 1)) ne "") {
--- 1624,1630 ----
      }
  
      $procs = 0;
!     save_state ("all") if ($keepstate);
      syslog ('info', "resetting, and re-reading configuration $CF{CF}");
  
      if ((my $err = read_cf ($CF{"CF"}, 1)) ne "") {
***************
*** 1608,1614 ****
      $fdset_rbits = $fdset_ebits = '';
      set_last_test ();
      randomize_startdelay() if ($CF{"RANDSTART"});
!     load_state ("disabled") if ($keepstate);
      if ($CF{"DTLOGGING"}) {
        init_dtlog();
      }
--- 1638,1644 ----
      $fdset_rbits = $fdset_ebits = '';
      set_last_test ();
      randomize_startdelay() if ($CF{"RANDSTART"});
!     load_state ("all") if ($keepstate);
      if ($CF{"DTLOGGING"}) {
        init_dtlog();
      }
***************
*** 1678,1684 ****
      my ($fl);
  
      $fl = '';
!     fcntl ($fh, F_GETFL, $fl)          || return;
      $fl |= O_NONBLOCK;
      fcntl ($fh, F_SETFL, $fl)          || return;
  
--- 1708,1714 ----
      my ($fl);
  
      $fl = '';
!     $fl = fcntl ($fh, F_GETFL, $fl)          || return;
      $fl |= O_NONBLOCK;
      fcntl ($fh, F_SETFL, $fl)          || return;
  
***************
*** 2193,2199 ****
        # list status of all services
        #
        } elsif ($cmd eq "opstatus") {
!           if ($args eq "")
            {
                foreach $group (keys %watch) {
                    foreach $service (keys %{$watch{$group}}) {
--- 2223,2229 ----
        # list status of all services
        #
        } elsif ($cmd eq "opstatus") {
!           if (!defined $args || $args eq "")
            {
                foreach $group (keys %watch) {
                    foreach $service (keys %{$watch{$group}}) {
***************
*** 2243,2253 ****
                }
            }
            foreach $group (keys %watch) {
!               if ($watch_disabled{$group} == 1) {
                    sock_write ($fh,  "watch $group\n");
                }
                foreach $service (keys %{$watch{$group}}) {
!                   if ($watch{$group}->{$service}->{'disable'} == 1) {
                        sock_write ($fh,  "watch $group service " .
                            "$service\n");
                    }
--- 2273,2284 ----
                }
            }
            foreach $group (keys %watch) {
!               if (exists $watch_disabled{$group} && $watch_disabled{$group} == 1) {
                    sock_write ($fh,  "watch $group\n");
                }
                foreach $service (keys %{$watch{$group}}) {
!                   if (defined $watch{$group}->{$service}->{'disable'} 
!                       && $watch{$group}->{$service}->{'disable'} == 1) {
                        sock_write ($fh,  "watch $group service " .
                            "$service\n");
                    }
***************
*** 2583,2589 ****
      # check auth
      #
      } elsif ($cmd eq "checkauth") {
!       split(' ',$args);
        $cmd = $_[0];
        $user = $clients{$cl}->{"user"};
        #  Note that we call check_auth without syslogging here.
--- 2614,2620 ----
      # check auth
      #
      } elsif ($cmd eq "checkauth") {
!       @_ = split(' ',$args);
        $cmd = $_[0];
        $user = $clients{$cl}->{"user"};
        #  Note that we call check_auth without syslogging here.
***************
*** 2614,2619 ****
--- 2645,2653 ----
      my $summary       = esc_str ($sref->{"_last_summary"}, 1);
      my $detail        = esc_str ($sref->{"_last_detail"}, 1);
      my $depend        = esc_str ($sref->{"depend"}, 1);
+     my $hostdepend    = esc_str ($sref->{"hostdepend"}, 1);
+     my $monitordepend = esc_str ($sref->{"monitordepend"}, 1);
+     my $alertdepend   = esc_str ($sref->{"alertdepend"}, 1);
      my $monitor       = esc_str ($sref->{"monitor"}, 1);
  
      my $comment;
***************
*** 2629,2652 ****
        $alerts_sent += $sref->{"periods"}->{$period}->{"_alert_sent"};
      }
  
!     my $buf =
!       "group=$group" . 
!       " service=$service" .
!       " opstatus=$sref->{_op_status}" .
!       " last_opstatus=$sref->{_last_op_status}" .
!       " exitval=$sref->{_exitval}" .
!       " timer=$sref->{_timer}" .
!       " last_success=$sref->{_last_success}" .
!       " last_trap=$sref->{_last_trap}" .
!       " last_check=$sref->{_last_check}" .
!       " ack=$sref->{_ack}" .
!       " ackcomment='$comment'" .
!       " alerts_sent=$alerts_sent" .
!       " depstatus=" . int ($sref->{"_depend_status"}) .
!       " depend='$depend'" .
!       " monitor='$monitor'" .
!       " last_summary='$summary'" .
!       " last_detail='$detail'";
  
      $buf .= " last_failure=$sref->{_last_failure}"
        if ($sref->{"_last_failure"});
--- 2663,2687 ----
        $alerts_sent += $sref->{"periods"}->{$period}->{"_alert_sent"};
      }
  
!     my $buf = "group=$group service=$service opstatus=$sref->{_op_status}";
!     $buf .= " last_opstatus=" . (defined $sref->{_last_op_status} ? 
$sref->{_last_op_status} : "");
!     $buf .= " exitval=" . (defined $sref->{_exitval} ? $sref->{_exitval} : "");
!     $buf .= " timer=" . (defined $sref->{_timer} ? $sref->{_timer} : "");
!     $buf .= " last_success=" . (defined $sref->{_last_success} ? 
$sref->{_last_success} : "");
!     $buf .= " last_trap=" . (defined $sref->{_last_trap} ? $sref->{_last_trap} : 
"");
!     $buf .= " last_traphost=" . (defined $sref->{_last_traphost} ? 
$sref->{_last_traphost} : "");
!     $buf .= " last_check=" . (defined $sref->{_last_check} ? $sref->{_last_check} : 
"");
!     $buf .= " ack=" . (defined $sref->{_ack} ? $sref->{_ack} : "");
!     $buf .= " ackcomment='$comment'";
!     $buf .= " alerts_sent=$alerts_sent";
!     $buf .= " depstatus=" . (defined $sref->{"_depend_status"} ? int 
($sref->{"_depend_status"}) : "");
!     $buf .= " depend='$depend'";
!     $buf .= " hostdepend='$hostdepend'";
!     $buf .= " monitordepend='$monitordepend'";
!     $buf .= " alertdepend='$alertdepend'";
!     $buf .= " monitor='$monitor'";
!     $buf .= " last_summary='$summary'";
!     $buf .= " last_detail='$detail'";
  
      $buf .= " last_failure=$sref->{_last_failure}"
        if ($sref->{"_last_failure"});
***************
*** 2763,2768 ****
--- 2798,2804 ----
        exit (1);
      }
  
+     print N "Mon starting at ".localtime(time)."\n";
      if (!open(STDOUT, ">&N") ||
          !open (STDIN, "<&N") ||
        !open (STDERR, ">&N")) {
***************
*** 2779,2785 ****
  sub debug {
      my ($level, @l) = @_;
  
!     return if ($level > $opt{"d"});
  
      if ($opt{"d"} && !$opt{"f"}) {
        print STDERR @l;
--- 2815,2821 ----
  sub debug {
      my ($level, @l) = @_;
  
!     return if (!defined $opt{"d"} || $level > $opt{"d"});
  
      if ($opt{"d"} && !$opt{"f"}) {
        print STDERR @l;
***************
*** 2832,2841 ****
  
        $sref->{"_last_checked"} = $tmnow;
  
!       if ($sref->{"depend"} ne "" &&
!               $sref->{"dep_behavior"} eq "a")
        {
!           dep_ok ($sref);
        }
  
        #
--- 2868,2878 ----
  
        $sref->{"_last_checked"} = $tmnow;
  
!       if ((defined $sref->{"depend"} && $sref->{"depend"} ne "" &&
!            $sref->{"dep_behavior"} eq "a") 
!           || (defined $sref->{"alertdepend"} && $sref->{"alertdepend"} ne ""))
        {
!           dep_ok ($sref, 'a');
        }
  
        #
***************
*** 2874,2880 ****
            # change interval if needed
            #
            if (defined ($sref->{"failure_interval"}) &&
!                       $sref->{"_old_interval"} == undef)
            {
                $sref->{"_old_interval"} = $sref->{"interval"};
                $sref->{"interval"} = $sref->{"failure_interval"};
--- 2911,2917 ----
            # change interval if needed
            #
            if (defined ($sref->{"failure_interval"}) &&
!                       !defined $sref->{"_old_interval"})
            {
                $sref->{"_old_interval"} = $sref->{"interval"};
                $sref->{"interval"} = $sref->{"failure_interval"};
***************
*** 2935,2941 ****
            # change interval back to original
            #
            if (defined ($sref->{"failure_interval"}) &&
!                       $sref->{"_old_interval"} != undef)
            {
                $sref->{"interval"} = $sref->{"_old_interval"};
                $sref->{"_old_interval"} = undef;
--- 2972,2978 ----
            # change interval back to original
            #
            if (defined ($sref->{"failure_interval"}) &&
!                       defined $sref->{"_old_interval"})
            {
                $sref->{"interval"} = $sref->{"_old_interval"};
                $sref->{"_old_interval"} = undef;
***************
*** 3069,3074 ****
--- 3106,3129 ----
            @ghosts = @g;
        }
  
+       #
+       # per-host dependencies
+       #
+       if ((defined $sref->{"depend"} && $sref->{"depend"} ne "" && 
+$sref->{"dep_behavior"} eq 'hm')
+           || (defined $sref->{"hostdepend"} && $sref->{"hostdepend"} ne ""))
+       {
+           my @g = ();
+           my $sum = dep_summary($sref);
+ 
+           for (my $i=0; $i<@ghosts; $i++)
+           {
+               push (@g, $ghosts[$i])
+                   if (! grep /\Q$ghosts[$i]\E/, @$sum);
+           }
+ 
+           @ghosts = @g;
+       }
+ 
        @args = (quotewords ('\s+', 0, $monitor), @ghosts);
      }
  
***************
*** 3094,3105 ****
            foreach $v (keys %{$sref->{"ENV"}}) {
                $ENV{$v} = $sref->{"ENV"}->{$v};
            }
!           $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"};
!           $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"};
!           $ENV{"MON_LAST_FAILURE"} = $sref->{"_last_failure"};
!           $ENV{"MON_FIRST_FAILURE"} = $sref->{"_first_failure"};
!           $ENV{"MON_DEPEND_STATUS"} = $sref->{"_depend_status"};
!           $ENV{"MON_LAST_SUCCESS"} = $sref->{"_last_success"};
            $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"};
            $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"};
            exec @args or syslog ('err', "could not exec '@args': $!")
--- 3149,3160 ----
            foreach $v (keys %{$sref->{"ENV"}}) {
                $ENV{$v} = $sref->{"ENV"}->{$v};
            }
!           $ENV{"MON_LAST_SUMMARY"} = $sref->{"_last_summary"} if (defined 
$sref->{"_last_summary"});
!           $ENV{"MON_LAST_OUTPUT"} = $sref->{"_last_output"} if (defined 
$sref->{"_last_output"});
!           $ENV{"MON_LAST_FAILURE"} = $sref->{"_last_failure"} if (defined 
$sref->{"_last_failure"});
!           $ENV{"MON_FIRST_FAILURE"} = $sref->{"_first_failure"} if (defined 
$sref->{"_first_failure"});
!           $ENV{"MON_DEPEND_STATUS"} = $sref->{"_depend_status"} if (defined 
$sref->{"_depend_status"});
!           $ENV{"MON_LAST_SUCCESS"} = $sref->{"_last_success"} if (defined 
$sref->{"_last_success"});
            $ENV{"MON_STATEDIR"} = $CF{"STATEDIR"};
            $ENV{"MON_LOGDIR"} = $CF{"LOGDIR"};
            exec @args or syslog ('err', "could not exec '@args': $!")
***************
*** 3248,3254 ****
      my $found = undef;
  
      foreach my $g (keys %groups) {
!       if ($cmd == 0) {
            if (grep (s/^$h$/*$h/, @{$groups{$g}}))
            {
                $found = 1;
--- 3303,3309 ----
      my $found = undef;
  
      foreach my $g (keys %groups) {
!       if ((!defined $cmd) || $cmd == 0) {
            if (grep (s/^$h$/*$h/, @{$groups{$g}}))
            {
                $found = 1;
***************
*** 3306,3316 ****
                }
            }
            foreach $group (keys %watch) {
!               if ($watch_disabled{$group} == 1) {
                    print STATE "disable watch $group\n";
                }
                foreach $service (keys %{$watch{$group}}) {
!                   if ($watch{$group}->{$service}->{'disable'} == 1) {
                        print STATE "disable service $group $service\n";
                    }
                }
--- 3361,3372 ----
                }
            }
            foreach $group (keys %watch) {
!               if (exists $watch_disabled{$group} && $watch_disabled{$group} == 1) {
                    print STATE "disable watch $group\n";
                }
                foreach $service (keys %{$watch{$group}}) {
!                   if (defined $watch{$group}->{$service}->{'disable'} 
!                       && $watch{$group}->{$service}->{'disable'} == 1) {
                        print STATE "disable service $group $service\n";
                    }
                }
***************
*** 3324,3333 ****
            }
            foreach $group (keys %watch) {
                foreach $service (keys %{$watch{$group}}) {
!                   print STATE "group=$group service=$service" .
!                       " op_status=$watch{$group}->{$service}->{_op_status}" .
!                       " failure_count=$watch{$group}->{$service}->{_failure_count}" 
.
!                       " alert_count=\n";
                }
            }
            close (STATE);
--- 3380,3398 ----
            }
            foreach $group (keys %watch) {
                foreach $service (keys %{$watch{$group}}) {
!                   print STATE "group=$group\tservice=$service";
!                   foreach my $var (qw(op_status failure_count alert_count 
last_success 
!                                       consec_failures last_failure first_failure 
last_summary 
!                                       last_detail ack ack_comment last_trap 
last_traphost exitval 
!                                       last_check last_op_status)) {
!                       print STATE "\t$var=" . 
esc_str($watch{$group}->{$service}->{"_$var"});
!                   }
!                   foreach my $periodlabel (keys 
%{$watch{$group}->{$service}->{periods}}) {
!                       foreach my $var (qw(last_alert alert_sent 1stfailtime 
failcount)) {
!                           print STATE "\t$periodlabel:$var=" . 
esc_str($watch{$group}->{$service}{periods}{$periodlabel}{"_$var"});
!                       }
!                   }
!                   print STATE "\n";
                }
            }
            close (STATE);
***************
*** 3344,3350 ****
      my ($l, $cmd, $args, $group, $service, $what, $state);
  
      foreach $state (@states) {
!       if ($state eq "disabled") {
            if (!open (STATE, "$CF{STATEDIR}/disabled")) {
                syslog ("err", "could not read state file: $!");
                next;
--- 3409,3415 ----
      my ($l, $cmd, $args, $group, $service, $what, $state);
  
      foreach $state (@states) {
!       if ($state eq "disabled" || $state eq "all") {
            if (!open (STATE, "$CF{STATEDIR}/disabled")) {
                syslog ("err", "could not read state file: $!");
                next;
***************
*** 3372,3377 ****
--- 3437,3468 ----
            syslog ("info", "state '$state' loaded");
            close (STATE);
        }
+ 
+       if ($state eq "opstatus" || $state eq "all") {
+           if (!open (STATE, "$CF{STATEDIR}/opstatus")) {
+               syslog ("err", "could not read state file: $!");
+               next;
+           }
+ 
+           while (defined ($l = <STATE>)) {
+               chomp $l;
+               my %opstatus = map{ /^(.*)=(.*)$/; $1 => $2} split (/\t/, $l,);
+               next unless (exists $opstatus{group} && exists 
+$watch{$opstatus{group}} 
+                            && exists $opstatus{service} && exists 
+$watch{$opstatus{group}}->{$opstatus{service}});
+ 
+               foreach my $op (keys %opstatus) {
+                   next if ($op eq 'group' || $op eq 'service');
+                   if ($op =~ /^(.*):(.*)$/) {
+                       next unless exists 
+$watch{$opstatus{group}}->{$opstatus{service}}{periods}{$1};
+                       
+$watch{$opstatus{group}}->{$opstatus{service}}{periods}{$1}{"_$2"} = 
+un_esc_str($opstatus{$op});
+                   } else {
+                       $watch{$opstatus{group}}->{$opstatus{service}}{"_$op"} = 
+un_esc_str($opstatus{$op});
+                   }
+               }
+           }
+           syslog ("info", "state '$state' loaded");
+           close (STATE);
+       }
      }
  }
  
***************
*** 3519,3531 ****
                # allow traps from all hosts
                #
  
!           } elsif ($host =~ /^[a-z]/) {
!               if (($host = inet_aton ($host)) eq "") {
                    syslog ('err', "invalid host in $CF{AUTHFILE}, line $.");
                    next;
                }
!           } elsif ($host =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) {
!               if (($host = inet_aton ($host)) eq "") {
                    syslog ('err', "invalid host in $CF{AUTHFILE}, line $.");
                    next;
                }
--- 3610,3622 ----
                # allow traps from all hosts
                #
  
!           } elsif ($host =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/) {
!               if (($host = inet_aton ($host)) eq "") {
                    syslog ('err', "invalid host in $CF{AUTHFILE}, line $.");
                    next;
                }
!           } elsif ($host =~ /^[A-Z\d][[A-Z\.\d\-]*[[A-Z\d]+$/i) {
!               if (($host = inet_aton ($host)) eq "") {
                    syslog ('err', "invalid host in $CF{AUTHFILE}, line $.");
                    next;
                }
***************
*** 3539,3544 ****
--- 3630,3636 ----
                $host = inet_ntoa ($host);              
            }
  
+           syslog ('notice', "Adding trap auth of: $host $user $password");
            $AUTHTRAPS{$host}{$user} = $password;
  
        } elsif ($sect eq "snmptrap") {
***************
*** 3793,3801 ****
      
      else
      {
!       $traphost = $addr;
      }
  
      if (defined ($AUTHTRAPS{$traphost}{"*"}))
      {
        $trapuser = "*";
--- 3885,3899 ----
      
      else
      {
!       $traphost = $fromip;
      }
  
+     if (!defined ($AUTHTRAPS{$traphost}))
+     {
+       syslog ('err', "received trap from unauthorized host: $fromip");
+       return undef;
+     }
+ 
      if (defined ($AUTHTRAPS{$traphost}{"*"}))
      {
        $trapuser = "*";
***************
*** 3808,3825 ****
        $trappass = $trap{"pas"};
      }
  
!     if (!defined ($AUTHTRAPS{$traphost}))
!     {
!       syslog ('err', "received trap from unauthorized host: $fromip");
!       return undef;
!     }
! 
!     if ($trapuser ne "*" &&
            crypt ($trappass, $AUTHTRAPS{$traphost}{$trapuser}) ne
!           $AUTHTRAPS{$traphost}{$trapuser})
!     {
!       syslog ('err', "received trap from unauthorized user $trapuser, host 
$traphost");
!       return undef;
      }
  
      #
--- 3906,3919 ----
        $trappass = $trap{"pas"};
      }
  
!     if ($trapuser ne "*") {
!       if (!defined $AUTHTRAPS{$traphost}{$trapuser} ||
            crypt ($trappass, $AUTHTRAPS{$traphost}{$trapuser}) ne
!           $AUTHTRAPS{$traphost}{$trapuser}) 
!         {
!             syslog ('err', "received trap from unauthorized user $trapuser, host 
$traphost");
!             return undef;
!         }
      }
  
      #
***************
*** 3876,3881 ****
--- 3970,3976 ----
      $sref->{"_last_trap"} = $time;
      $sref->{"_last_detail"} = $trap{"dtl"};
      $sref->{"_last_summary"} = $trap{"sum"};
+     $sref->{"_last_traphost"} = $fromip;
  
      if ($intended)
      {
***************
*** 3884,3890 ****
  
      my $old_status = $sref->{"_op_status"};
  
!     syslog ('info', "trap $trap{typ} $trap{spc} from " .
        "$fromip for $trap{grp} $trap{svc}, status $trap{sta}");
  
      my $group = $trap{"grp"};
--- 3979,3985 ----
  
      my $old_status = $sref->{"_op_status"};
  
!     syslog ('debug', "trap $trap{typ} $trap{spc} from " .
        "$fromip for $trap{grp} $trap{svc}, status $trap{sta}");
  
      my $group = $trap{"grp"};
***************
*** 3969,3978 ****
      push @last_failures, "$trap{grp} $trap{svc}" .
        " $tm $trap{typ} $trap{spc} $trap{sum}";
  
!     if ($sref->{"depend"} ne "" &&
!           $sref->{"dep_behavior"} eq "a")
      {
!       dep_ok ($sref);
      }
  
      #
--- 4064,4074 ----
      push @last_failures, "$trap{grp} $trap{svc}" .
        " $tm $trap{typ} $trap{spc} $trap{sum}";
  
!     if ((defined $sref->{"depend"} && $sref->{"depend"} ne "" &&
!        $sref->{"dep_behavior"} eq "a")
!       || (defined $sref->{"alertdepend"} && $sref->{"alertdepend"} ne ""))
      {
!       dep_ok ($sref, 'a');
      }
  
      #
***************
*** 4026,4031 ****
--- 4122,4128 ----
        {
            $sref->{"periods"}->{$period}->{"_last_alert"} = 0;
            $sref->{"periods"}->{$period}->{"_alert_sent"} = 0;
+           $sref->{"periods"}->{$period}->{"_1stfailtime"} = 0;
        }
      } else {
          $sref->{"_failure_output"} = $trap{"sum"} . $trap{"dtl"};
***************
*** 4050,4056 ****
--- 4147,4155 ----
      $tmnow = time;
  
      my $sref = \%{$watch{$group}->{$service}};
+     dep_ok ($sref, 'a');
      $sref->{"_failure_count"}++;
+     $sref->{"_consec_failures"}++;
      $sref->{"_last_failure"} = $tmnow;
      $sref->{"_first_failure"} = $tmnow if ($sref->{"_op_status"} != $STAT_FAIL);
      set_op_status ($group, $service, $STAT_FAIL);
***************
*** 4060,4066 ****
      push @last_failures, "$group $service $tm $sref->{_last_summary}";
      syslog ('crit', "failure for $last_failures[-1]");
  
!     do_alert ($group, $service, undef, undef, $FL_TRAPTIMEOUT);
  }
  
  
--- 4159,4165 ----
      push @last_failures, "$group $service $tm $sref->{_last_summary}";
      syslog ('crit', "failure for $last_failures[-1]");
  
!     do_alert ($group, $service, "trap timeout\n", -1, $FL_TRAPTIMEOUT);
  }
  
  
***************
*** 4521,4535 ****
      $sref->{"_timer"} = $sref->{"interval"}
        if ($sref->{"interval"});
  
      foreach my $period (keys %{$sref->{"periods"}}) {
        my $pref = \%{$sref->{"periods"}->{$period}};
  
        $pref->{"_last_alert"} = 0
            if ($pref->{"alertevery"});
        
-       $pref->{"_consec_failures"} = 0
-           if ($pref->{"alertafter_consec"});
-       
        $pref->{'_1stfailtime'} = 0
            if ($pref->{"alertafterival"});
      }
--- 4620,4634 ----
      $sref->{"_timer"} = $sref->{"interval"}
        if ($sref->{"interval"});
  
+     $sref->{"_consec_failures"} = 0
+       if ($sref->{"_consec_failures"});
+       
      foreach my $period (keys %{$sref->{"periods"}}) {
        my $pref = \%{$sref->{"periods"}->{$period}};
  
        $pref->{"_last_alert"} = 0
            if ($pref->{"alertevery"});
        
        $pref->{'_1stfailtime'} = 0
            if ($pref->{"alertafterival"});
      }
***************
*** 4597,4603 ****
  
      my $tmnow = time;
      my ($summary) = split("\n", $args{"output"});
!     $summary = "(NO SUMMARY)" if ($summary =~ /^\s*$/m);
  
      my $sref = \%{$watch{$args{"group"}}->{$args{"service"}}};
      my $pref;
--- 4696,4702 ----
  
      my $tmnow = time;
      my ($summary) = split("\n", $args{"output"});
!     $summary = "(NO SUMMARY)" if (!defined $summary || $summary =~ /^\s*$/m);
  
      my $sref = \%{$watch{$args{"group"}}->{$args{"service"}}};
      my $pref;
***************
*** 4606,4611 ****
--- 4705,4714 ----
        $pref = $args{"pref"};
      }
  
+     if (! defined $args{"args"}) {
+       $args{"args"} = '';
+     }
+ 
      my $alert = "";
      if (!defined $ALERTHASH{$args{"alert"}} ||
            ! -f $ALERTHASH{$args{"alert"}}) {
***************
*** 4661,4676 ****
            $ENV{$v} = $sref->{"ENV"}->{$v};
        }
  
!       $ENV{"MON_LAST_SUMMARY"}        = $sref->{"_last_summary"};
!       $ENV{"MON_LAST_OUTPUT"}         = $sref->{"_last_output"};
!       $ENV{"MON_LAST_FAILURE"}        = $sref->{"_last_failure"};
!       $ENV{"MON_FIRST_FAILURE"}       = $sref->{"_first_failure"};
!       $ENV{"MON_LAST_SUCCESS"}        = $sref->{"_last_success"};
!       $ENV{"MON_DESCRIPTION"}         = $sref->{"description"};
!       $ENV{"MON_GROUP"}               = $args{"group"};
!       $ENV{"MON_SERVICE"}             = $args{"service"};
!       $ENV{"MON_RETVAL"}              = $args{"retval"};
!       $ENV{"MON_OPSTATUS"}            = $sref->{"_op_status"};
        $ENV{"MON_ALERTTYPE"}           = $alert_type;
        $ENV{"MON_STATEDIR"}            = $CF{"STATEDIR"};
        $ENV{"MON_LOGDIR"}              = $CF{"LOGDIR"};
--- 4764,4779 ----
            $ENV{$v} = $sref->{"ENV"}->{$v};
        }
  
!       $ENV{"MON_LAST_SUMMARY"}        = $sref->{"_last_summary"} if (defined 
$sref->{"_last_summary"});
!       $ENV{"MON_LAST_OUTPUT"}         = $sref->{"_last_output"} if (defined 
$sref->{"_last_output"});
!       $ENV{"MON_LAST_FAILURE"}        = $sref->{"_last_failure"} if (defined 
$sref->{"_last_failure"});
!       $ENV{"MON_FIRST_FAILURE"}       = $sref->{"_first_failure"} if (defined 
$sref->{"_first_failure"});
!       $ENV{"MON_LAST_SUCCESS"}        = $sref->{"_last_success"} if (defined 
$sref->{"_last_success"});
!       $ENV{"MON_DESCRIPTION"}         = $sref->{"description"} if (defined 
$sref->{"description"});
!       $ENV{"MON_GROUP"}               = $args{"group"} if (defined $args{"group"});
!       $ENV{"MON_SERVICE"}             = $args{"service"} if (defined 
$args{"service"});
!       $ENV{"MON_RETVAL"}              = $args{"retval"} if (defined 
$args{"retval"});
!       $ENV{"MON_OPSTATUS"}            = $sref->{"_op_status"} if (defined 
$sref->{"_op_status"});
        $ENV{"MON_ALERTTYPE"}           = $alert_type;
        $ENV{"MON_STATEDIR"}            = $CF{"STATEDIR"};
        $ENV{"MON_LOGDIR"}              = $CF{"LOGDIR"};
***************
*** 4774,4780 ****
  # }
  #
  sub depend {
!     my ($depend, $depth) = @_;
      debug (1, "checking DEP [$depend]\n");
  
      if ($depth > $CF{"DEP_RECUR_LIMIT"}) {
--- 4877,4883 ----
  # }
  #
  sub depend {
!     my ($depend, $depth, $deptype) = @_;
      debug (1, "checking DEP [$depend]\n");
  
      if ($depth > $CF{"DEP_RECUR_LIMIT"}) {
***************
*** 4791,4801 ****
  
        my $sref = \%{$watch{$group}->{$service}};
        my $depval = undef;
  
        #
        # disabled watches and services are counted as "passing"
        #
!       if ($watch_disabled{$group} || $sref->{"disable"} == 1)
        {
            $depval = 1;
  
--- 4894,4912 ----
  
        my $sref = \%{$watch{$group}->{$service}};
        my $depval = undef;
+       my $subdepend = "";
+       if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq $deptype) {
+           $subdepend = $sref->{"depend"};
+       } elsif ($deptype eq 'a' && defined $sref->{"alertdepend"}) {
+           $subdepend = $sref->{"alertdepend"};
+       } elsif ($deptype eq 'm' && defined $sref->{"monitordepend"}) {
+           $subdepend = $sref->{"monitordepend"};
+       } 
  
        #
        # disabled watches and services are counted as "passing"
        #
!       if ((exists $watch_disabled{$group} && $watch_disabled{$group}) || (defined 
$sref->{"disable"} && $sref->{"disable"} == 1))
        {
            $depval = 1;
  
***************
*** 4803,4809 ****
        # root dependency found
        #
        }
!       elsif ($sref->{"depend"} eq "")
        {
            debug (1, "  found root dep $group,$service\n");
  
--- 4914,4920 ----
        # root dependency found
        #
        }
!       elsif ($subdepend eq "")
        {
            debug (1, "  found root dep $group,$service\n");
  
***************
*** 4818,4824 ****
            #
            # do it recursively
            #
!           my $dstatus = depend ($sref->{"depend"}, $depth + 1);
            debug (1,
                "recur depth $depth returned 
$dstatus->{status},$dstatus->{depend}\n");
  
--- 4929,4935 ----
            #
            # do it recursively
            #
!           my $dstatus = depend ($subdepend, $depth + 1, $deptype);
            debug (1,
                "recur depth $depth returned 
$dstatus->{status},$dstatus->{depend}\n");
  
***************
*** 4874,4881 ****
  sub dep_ok
  {
      my $sref = shift;
  
!     my $s = depend ($sref->{"depend"}, 0);
  
      if ($s->{"status"} eq "D")
      {
--- 4985,5003 ----
  sub dep_ok
  {
      my $sref = shift;
+     my $deptype = shift;
+     my $depend = "";
+     if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq $deptype) {
+       $depend = $sref->{"depend"};
+     } elsif ($deptype eq 'a' && defined $sref->{"alertdepend"}) {
+       $depend = $sref->{"alertdepend"};
+     } elsif ($deptype eq 'm' && defined $sref->{"monitordepend"}) {
+       $depend = $sref->{"monitordepend"};
+     }
+ 
+     return 1 unless ($depend ne "");
  
!     my $s = depend ($depend, 0, $deptype);
  
      if ($s->{"status"} eq "D")
      {
***************
*** 4901,4906 ****
--- 5023,5060 ----
  
  
  #
+ # returns undef on error
+ #         otherwise a reference to a list summaries from all 
+ #            DIRECT dependencies currently failing
+ sub dep_summary 
+ {
+     my $sref = shift;
+     my @sum;
+     my @deps = ();
+     
+     if (defined $sref->{"depend"} && $sref->{"dep_behavior"} eq "hm") {
+       @deps = ($sref->{"depend"} =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g);
+     } elsif (defined $sref->{"hostdepend"}) {
+       @deps = ($sref->{"hostdepend"} =~ /[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+/g);
+     }
+     
+     return [] if (! @deps);
+ 
+     foreach (@deps) {
+       my ($group, $service) = split /:/;
+       if (!(exists $watch{$group} && exists $watch{$group}->{$service})) {
+           return undef;
+       }
+       
+       if ($watch{$group}->{$service}{"_op_status"} == $STAT_FAIL) {
+           push @sum, $watch{$group}->{$service}{"_last_summary"};
+       }
+     }
+ 
+     return \@sum;
+ }
+     
+ #
  # convert a string to a hex-escaped string, returning
  # the escaped string.
  #
***************
*** 4915,4921 ****
      my $inquotes = shift;
  
      my $escstr = "";
! 
      for (my $i = 0; $i < length ($str); $i++)
      {
        my $c = substr ($str, $i, 1);
--- 5069,5075 ----
      my $inquotes = shift;
  
      my $escstr = "";
!     return $escstr if (!defined $str);
      for (my $i = 0; $i < length ($str); $i++)
      {
        my $c = substr ($str, $i, 1);
I'm going to use the same basic format for these comments as 
in my last set.  First the changes list, then the detailed mapping of 
patch sections to changes.  

The changes are:
1. Added full support for saving/loading full opstatus information.

2. Added support for specifying which type(s) of state to load
when mon is started with the -l switch.

3. Added new dependency behavior type 'hm', for per-host monitor suppression.

4. Added the ability to have multiple dependency expressions associated with
a single watch/service.  This added three new mon.cfg keywords 'alertdepend', 
'monitordepend', and 'hostdepend'.

5. Fixed some bugs with trap authentication checking where traps from
any host were being allowed.

6. Fixed a couple bugs that was preventing traptimeouts from sending 
alerts when there was a dependency involved, or an alertafter statement.

7. *Lots* of little changes to make 'perl -w' happy with mon.  As a
side effect of this, the memory leak problems I was having seem to have
gone away.

8. Added code to track what host a trap comes from

9. Fixed a couple bugs where things weren't getting reset after an up trap.

And here's the per-section annotation:

65: 3
198: 2
343: 2
369: 7
384: 6
411: 4 & 7
530: 7
540: 4 & 7
562: 7
1002: 3
1208: 8
1472: 3
1484: 4
1594: 1
1608: 1
1678: 7
2193: 7
2243: 7
2583: 7 (Implicit assigning to @_ with split is deprecated)
2614: 4
2629: 4 & 7
2763: Added log entry for mon restarts
2779: 7
2832: 4 & 7
2874: 7
2935: 7
3069: 3 & 4
3094: 7
3248: 7
3306: 7
3324: 1
3344: 2
3372: 1 & 2
3519: 5
3539: 5
3793: 5
3808: 5
3876: 8
3884: Reduced the syslog logging level of the trap logging
3969: 4
4026: 9
4050: 6
4060: 6
4521: 9
4597: 7
4606: 7
4661: 7
4774: 4
4791: 4
4803: 4
4818: 4
4874: 4
4901: 3 & 4
4915: 7

Reply via email to