I've had a disk failure or two. The only problem I had was the SCSI
timeouts with an AIC7890. The system would lock up waiting on the SCSI i/o,
but a reboot would always recover and come up with 2 out of 3.
Here's a script I use for watching status (forget where I got it from).
Just run "checkmd -i", then put a crontab entry like:
15 * * * * /usr/local/bin/checkmd -v
If there's any output (i.e. errors) root will get an email.
#####################################################
#! /bin/bash
#
# This script checks that the md configuration is the same as that
# read at configuration time. When called with the -i option, it
# reads /proc/mdstat and learns the configuration. If called without
# args, it returns non zero status if the configuration is different
# from the one learned, and prints a message if the -v flag is present.
#
# usage: checkmd.sh [-iv]
init=""
verbose=""
CONF=/etc/md.conf
while getopts "iv" opt; do case $opt in
i)
init=true
;;
v)
verbose=true
;;
*)
cat <<-EOF
usage: $0 [-iv] [fromdev] [todev]
-i means init the file $CONF with the current md configuration
-v means display a message in case of configuration mismatch
EOF
exit 1
;;
esac; done
if [ ! -r $CONF -o "$init" = true ]; then
cat /proc/mdstat > $CONF
chmod 444 $CONF
echo "Current configuration saved in $CONF:" >&2
cat $CONF >&2
else
cat /proc/mdstat | cmp $CONF >/dev/null
if [ $? != 0 ]; then
if [ $verbose ]; then
echo >&2
echo "ALARM! md configuration problem" >&2
echo >&2
echo "Current configuration is:" >&2
cat /proc/mdstat >&2
echo >&2
echo "should be:" >&2
cat $CONF >&2
fi
exit 1
fi
fi
#####################################################
________________________________________
Michael D. Black Principal Engineer
[EMAIL PROTECTED] 407-676-2923,x203
http://www.csi.cc Computer Science Innovations
http://www.csi.cc/~mike My home page
FAX 407-676-2355
----- Original Message -----
From: Wayne Buttles <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 07, 1999 7:12 PM
Subject: Notify scripts?
I have been playing with raid on a stock Redhat 6.0 install for a couple
days now. I think I have finally figured everything out. I have raid5
working with 3 drives automounting via the kernel with type fd partitions.
I powered off a drive and then added it back with raidhotadd on a
successive boot with no problem. All seems swell.
I was wondering, are there scripts to add audible beep and/or email admin
on failure? After all, if raid is working properly you won't even notice
unless you are logged in (right?).
Also, has anyone had a drive fail for real? I'm curious about the real
life condidion of a scsi driver dealing with a failed drive.
Thanks,
Wayne.