Hi,
Im new to this list so if this has been asked in the past please exuse me and point me
in the write direction :)
I have just started using mon, and have found the inital tests seem to be realy good,
i'm now looking at using it to monitor all my servers (aprox 20).
I have a problem at the moment :(
I have tried adapting the alert.template to create a new alert, but it wont run :(
I use webmin to setup mon, it wont even see the new script :(
I then edited the mon.cf file by hand, and got nowhere ether :(
and when the alert is triggered i get nothing showing the new program was called :(
my mon.cf is as follows :-
cat /etc/mon/mon.cf
#
# Extremely basic mon.cf file
#
#
# global options
#
cfbasedir = /etc/mon
pidfile = /var/run/mon.pid
statedir = /var/run/mon/state.d
logdir = /var/run/mon/log.d
dtlogfile = /var/run/mon/log.d/downtime.log
alertdir = /usr/lib/mon/alert.d
mondir = /usr/lib/mon/mon.d
maxprocs = 20
histlength = 100
randstart = 60s
authtype = userfile
userfile = /etc/mon/userfile
#
# group definitions (hostnames or IP addresses)
#
hostgroup servers localhost
watch servers
service ping
interval 5m
monitor fping.monitor
period wd {Mon-Fri} hr {7am-10pm}
alert mail.alert [EMAIL PROTECTED]
alertevery 1h
period wd {Sat-Sun}
alert mail.alert [EMAIL PROTECTED]
service http
interval 4m
monitor http.monitor
allow_empty_group
period wd {Sun-Sat}
upalert mail.alert -S "web server is back up" [EMAIL PROTECTED]
alertevery 45m
service smtp
interval 10m
monitor smtp.monitor
period wd {Mon-Fri} hr {7am-10pm}
alertevery 1h
alertafter 2 30m
alert qpage.alert [EMAIL PROTECTED]
service pop3
interval 30s
monitor pop3.monitor
period
alert alert.reboot
# See /usr/doc for the original example...
the script i have writen is in :
pwd
/usr/lib/mon/alert.d
-rwxr-xr-x 1 root root 1911 Jul 22 23:32 alert.reboot
and the script is as follows :-
#!/usr/bin/perl
#
# Reboot alert system
# Matt Lowe [EMAIL PROTECTED]
# Created from template by
#
# Jim Trocki, [EMAIL PROTECTED]
#
# $Id: alert.template 1.1 Sat, 26 Aug 2000 15:22:34 -0400 trockij $
#
# Copyright (C) 1998, Jim Trocki
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
use Getopt::Std;
getopts ("s:g:h:t:l:u");
#entered by [EMAIL PROTECTED]
open LOG_FILE, ">>/var/log/mon";
select LOG_FILE;
#
# the first line is summary information, adequate to send to a pager
# or email subject line
#
#
# the following lines normally contain more detailed information,
# but this is monitor-dependent
#
# see the "Alert Programs" section in mon(1) for an explanation
# of the options that are passed to the monitor script.
#
$summary=<STDIN>;
chomp $summary;
$t = localtime($opt_t);
($wday,$mon,$day,$tm) = split (/\s+/, $t);
print <<EOF;
Alert for group $opt_g, service $opt_s
EOF
print "This alert was sent because service was restored\n"
if ($opt_u);
print <<EOF;
This happened on $wday $mon $day $tm
Summary information: $summary
Arguments passed to this script: @ARGV
Detailed information follows:
EOF
while (<STDIN>) {
print;
}
if ($MON_ALERTTYPE == 'failure') {
system("/sbin/shutdown -r 1");
}
(sorry for the large post)
The program is ment to output standard out to a log file, and reboot the system.
I know this reboot might seem extream but the server it is being designed for is very
unstable (its only got to last another 2 weeks or so, before replacment).
But the routine would be helpfull on a couple of servers i have as 'worst case'
problem solvers :)
Also i'd like to ask if anyone has writen an alert that shuts down a service and
starts it backup again?
something like :-
Alert type: service_restart
Extra Params : "service_shutdown", "service_startup", pause
would execute somthing like:
system ('service',$service_shutdown);
sleep $pause;
system ('service',$service_startup);
this could then very esaly be made to handle restarting any number of services on a
linux box, without having to write an individual alert for each service
thanks for any help
Matt Lowe
mon <at> mlsis.org
_______________________________________________
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon