Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread Jim Trocki

On Wed, 19 Apr 2006, Jon Meek wrote:


That blank line after "monitor http.monitor" is probably not a good thing.



yeah, that is the problem. a blank line signifies the end of a watch record.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread Brendan Mullen

Jim Trocki wrote:

On Wed, 19 Apr 2006, Jon Meek wrote:
That blank line after "monitor http.monitor" is probably not a good 
thing.
yeah, that is the problem. a blank line signifies the end of a watch 
record.




I was migrating our instance of Mon to a new machine running a newer 
version.


The problem was traced to a locally modified version of the qpage alert. 
 If I had used the qpage.alert that shipped with the version of Mon I 
was using,  I would have been fine.


The locally modified qpage.alert worked on an older version of Mon, but 
not 1.0pre5   The page would be sent but never show up in the alert 
history, and then would be sent again.  and again...


It was explained to me that you can use the  1.1 daemon with mon-client 
1.0.  I mistakenly thought that if I was running the 1.0 client,  I 
needed to be running a 1.0 daemon.


Mon is great and helps us keep track of 54 watch groups monitoring 
hundreds of servers and services.  Thanks to Jim for creating it and to 
everyone else in the Mon community that contributes to this very 
valuable free software.


Good luck to you,

Brendan Mullen
Enterprise Information Technology Services - Core Services Support
University of Georgia


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread David Nolan



--On Thursday, April 20, 2006 09:10:53 -0400 Brendan Mullen 
<[EMAIL PROTECTED]> wrote:



I was migrating our instance of Mon to a new machine running a newer
version.

The problem was traced to a locally modified version of the qpage alert.
If I had used the qpage.alert that shipped with the version of Mon I was
using,  I would have been fine.

The locally modified qpage.alert worked on an older version of Mon, but
not 1.0pre5   The page would be sent but never show up in the alert
history, and then would be sent again.  and again...


Interesting.  Can you explain the cause of the failure?  Was qpage.alert 
exiting with an error code that made Mon think it need to re-try the alert?



-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread Jim Trocki

On Thu, 20 Apr 2006, Brendan Mullen wrote:

The locally modified qpage.alert worked on an older version of Mon, but not 
1.0pre5   The page would be sent but never show up in the alert history, and 
then would be sent again.  and again...


Hmm.

Check your logs. Mon syslogs when an alert exits with a nonzero
status:

if ($exitval)
{
syslog ("err", "child alert for " .
"$args{group}/$args{service} " .
"failed, exited with $exitval");
return undef;
}

If this is happening, it can screw up the alert management stuff (e.g.
_last_alert will not get updated).

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread David Nolan



--On Thursday, April 20, 2006 10:35:05 -0400 Brendan Mullen 
<[EMAIL PROTECTED]> wrote:



The cause of the failure was the ! infront of the system command calling
qpage below.   The mon daemon this qpage.alert worked on was 1.27 from
Sat Sep 8 2001. with a moncmd version output of 9745.   I think this is
the .99.2 stable version.


Wow, now thats an old Mon. :)

I think at some point in the past Mon wasn't paying attention to alert 
script exit codes.  That version of qpage (with the !) is using the wrong 
logic and was always exiting with an error code.  Now that mon pays 
attention to that it became an issue.



-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pager alert continues to page multiple times after single failure

2006-04-20 Thread Brendan Mullen

David Nolan wrote:



--On Thursday, April 20, 2006 09:10:53 -0400 Brendan Mullen 
<[EMAIL PROTECTED]> wrote:

The problem was traced to a locally modified version of the qpage alert.
If I had used the qpage.alert that shipped with the version of Mon I was
using,  I would have been fine.

The locally modified qpage.alert worked on an older version of Mon, but
not 1.0pre5   The page would be sent but never show up in the alert
history, and then would be sent again.  and again...


Interesting.  Can you explain the cause of the failure?  Was qpage.alert 
exiting with an error code that made Mon think it need to re-try the alert?


The cause of the failure was the ! infront of the system command calling 
qpage below.   The mon daemon this qpage.alert worked on was 1.27 from 
Sat Sep 8 2001. with a moncmd version output of 9745.   I think this is 
the .99.2 stable version.


Here is the section from our qpage.alert which may have been modified 
from the shipped version.


else


if (!system ("/usr/local/bin/qpage -p $pagedest " .
"'$ALERT $opt_g/$opt_s: $summary ($wday $mon $day $tm)'" .
"2>/dev/null"))
{
die "could not open pipe to qpage: $?\n";
}
}
}

The qpage alert in the distributed version of mon I was using does not 
have this ! infront of the system command.  Otherwise the qpage.alerts 
are identical (except for the path being explicit to our local environment).


The error code was (sorry this may not be exact):
  Child alert failed http:WATCHNAME error code 255.


-Brendan

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon