Re: [SLUG] monitoring stuff

2000-10-12 Thread Howard Lowndes

Damn it, sorry about that.  Please scrub it.

-- 
Howard.
__
LANNet Computing Associates 

On Fri, 13 Oct 2000, Howard Lowndes wrote:

> David,  off list reply.
> 
> Have a look at this script.  I use it on my border router to monitor the
> health of my client's sites.  You might get some clues from it to adapt it
> to your needs.  Use swatch on the /var/log/messages file, or use the
> output directly to trigger your qpage server.  Runs from root cron every 6
> minutes.
> 
> Please do not publish it.
> 
> 



--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



Re: [SLUG] monitoring stuff

2000-10-12 Thread Howard Lowndes

David,  off list reply.

Have a look at this script.  I use it on my border router to monitor the
health of my client's sites.  You might get some clues from it to adapt it
to your needs.  Use swatch on the /var/log/messages file, or use the
output directly to trigger your qpage server.  Runs from root cron every 6
minutes.

Please do not publish it.

-- 
Howard.
__
LANNet Computing Associates 

#!/bin/sh

LINKS=" gw.alb2.Albury.telstra.net"
LINKS=${LINKS}" janus.lannet.com.au"
LINKS=${LINKS}" keep.lannet.com.au"
LINKS=${LINKS}" hero.lannet.com.au"
LINKS=${LINKS}" scribe.lannet.com.au"
# LINKS=${LINKS}" guru.lannet.com.au"
LINKS=${LINKS}" acay.aircentre.com"
LINKS=${LINKS}" ac.aircentre.com"
LINKS=${LINKS}" acbth.aircentre.com"
# LINKS=${LINKS}" saxon.af.com.au"
# LINKS=${LINKS}" murray.af.com.au"
LINKS=${LINKS}" scout.auf.asn.au"
LINKS=${LINKS}" stratos.auf.asn.au"
LINKS=${LINKS}" skydart.auf.asn.au"
# LINKS=${LINKS}" gw.skills.asn.au"
# LINKS=${LINKS}" rsi.skills.asn.au"
# LINKS=${LINKS}" sit.skills.asn.au"
LINKS=${LINKS}" cwsvr.caterworld.com.au"
LINKS=${LINKS}" cway.caterworld.com.au"
LINKS=${LINKS}" cwbdg.caterworld.com.au"
LINKS=${LINKS}" cwglg.caterworld.com.au"
LINKS=${LINKS}" cwml.caterworld.com.au"
LINKS=${LINKS}" cwsht.caterworld.com.au"
LINKS=${LINKS}" cwwg.caterworld.com.au"
# LINKS=${LINKS}" atelal.atel.com.au"
# LINKS=${LINKS}" atelwd.atel.com.au"
# LINKS=${LINKS}" atelwg.atel.com.au"
# LINKS=${LINKS}" atelwn.atel.com.au"

# Add additional sites above here (note leading space in string) ##
###
# Nothing to change below here 

# Mods register
# 25 Feb 00 - change crontab cycle from 5 to 6 minutes
# 25 Feb 00 - change log count before blowing whistle from 3 to 2
# 25 Feb 00 - the link down alarm response time is now reduced
# from >10<15 mins to >6<12 mins
#

PATH=/bin:/usr/bin:/sbin:/usr/sbin

docheck() {
  # try twice more before we log it
  ping -qnc1 $i >/dev/null
  PINGERROR=$?
  if [ ${PINGERROR}x != 0x ]; then
# first retry
ping -qnc1 $i >/dev/null
PINGERROR=$?
if [ ${PINGERROR}x != 0x ]; then
  # time to take some action like logging it
  date -u +"%Y-%m-%d %H:%M:%S %Z" >> /tmp/$i
  if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then
# this is the second logging (6 failures) so blow the whistle
MSG="LINK DOWN ALARM $i"
logger -i -t `basename $0` ${MSG}
mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
sendmail -q
  fi
fi
  fi
}

for i in ${LINKS}; do
  ping -qnc1 $i >/dev/null
  PINGERROR=$?
  if [ ${PINGERROR}x = 0x ]; then
# the machine is talking to us
if [ -e /tmp/$i ]; then
  # we had a problem but all is now well
  if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then 
# if we sent an alarm we must cancel the alarm
MSG="CANCEL ALARM $i"
logger -i -t `basename $0` ${MSG}
mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
sendmail -q
  fi
  # get rid of the logging file
  rm -f /tmp/$i
fi
  else
if [ -e /tmp/$i ]; then
  # we have already had some failures
  if [ `wc -l /tmp/$i | awk '{print $1}'`x != 2x ]; then
# but not yet enough to blow the whistle
docheck
  fi
else
  docheck
fi
  fi
done



--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



Re: [SLUG] monitoring stuff

2000-10-12 Thread Howard Lowndes

You can add to that list swatch.

I think for what you seem to be wanting then a bit of cron, awk work might
be wanted.  You need to determine:
what is the CPU usage threshold
how frequently are you going to sample it
how long are you going to allow it to over limit before it triggers


-- 
Howard.
__
LANNet Computing Associates 

On Fri, 13 Oct 2000, Dave Kempe wrote:

> Hey sluggers,
> Wondering if anyone has decent recommendations on monitoring software.
> I'm trialing mon, spong, bandmin and looking at mrtg and big brother atm.
> 
> I know they all have different usages, and i have a few different needs, so
> i guess I'll end up with more than one of those that become useful. One
> thing I need to monitor right now, is a CPU usage threshold. I have qpage
> setup to page/sms stuff ok i think, I just need to trigger it. I have dug
> into the above packages, not sure which one is easiest to say set a rule
> that says; If cpu exceeds x% for y time then send alert.
> The modules that come with some of the best candidates above don't seem to
> say that.
> Is there anything out there I can use for this sort of rule? Is it easy to
> write a custom daemon that does it? Or is it an inherently flawed way - ie
> to intensive itself?
> 
> Thanks,
> 
> Dave
> 
> __
> solutionsFirst.net Consulting
> http://solutionsfirst.net
> Ph: (02) 9413 9604
> Fax: (02) 9413 9719
> Mob: 0413 022 143
> 
> 
> 
> --
> SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
> More Info: http://slug.org.au/lists/listinfo/slug
> 



--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



Re: [SLUG] monitoring stuff

2000-10-12 Thread John Ferlito

On Fri, Oct 13, 2000 at 12:16:37AM +1000, Dave Kempe wrote:
> Hey sluggers,
> Wondering if anyone has decent recommendations on monitoring software.
> I'm trialing mon, spong, bandmin and looking at mrtg and big brother atm.
> 

Checkout http://www.netsaint.org very configurable supports SNMP
and interacts with a few other decent tools like mrtg etc.

-- 
John

The difference between a good man and a bad one is the 
choice of cause - William James


--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug



[SLUG] monitoring stuff

2000-10-12 Thread Dave Kempe

Hey sluggers,
Wondering if anyone has decent recommendations on monitoring software.
I'm trialing mon, spong, bandmin and looking at mrtg and big brother atm.

I know they all have different usages, and i have a few different needs, so
i guess I'll end up with more than one of those that become useful. One
thing I need to monitor right now, is a CPU usage threshold. I have qpage
setup to page/sms stuff ok i think, I just need to trigger it. I have dug
into the above packages, not sure which one is easiest to say set a rule
that says; If cpu exceeds x% for y time then send alert.
The modules that come with some of the best candidates above don't seem to
say that.
Is there anything out there I can use for this sort of rule? Is it easy to
write a custom daemon that does it? Or is it an inherently flawed way - ie
to intensive itself?

Thanks,

Dave

__
solutionsFirst.net Consulting
http://solutionsfirst.net
Ph: (02) 9413 9604
Fax: (02) 9413 9719
Mob: 0413 022 143



--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug