Re: [Nagios-users] NSCA and long output

2008-05-27 Thread Mike Hamrick
Hi Aaron,

You wrote:
> I've been trying to figure out if this is possible for a while. I'm
> using NRPE and $LONGHOSTOUTPUT$ for a number of tests, which is great,
> except for passive monitoring. We have several data centers that run
> their own Nagios boxes and then ship the data back to the master  
> Nagios
> server via NSCA. The problem is that I can't get NSCA to utilize the
> $LONGHOSTOUTPUT$ - this is kind of critical for things like log file
> checks, etc. With NSCA this data doesn't get passed.

Looking at the NSCA sources, common.h has:

#define MAX_PLUGINOUTPUT_LENGTH   512

I'm guessing that's the issue right there.  The first thing I'd try is  
to bump that up to 4096, and recompile send_nsca and nsca.  I haven't  
looked very carefully at the source or tried this myself, but it seems  
like a good place to start.

Mike




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Antwort: Re: Fwd: plugin for iostat readings?

2008-03-02 Thread Mike Hamrick
On Mon, 25 Feb 2008 [EMAIL PROTECTED] wrote:

> You need to call iostat with multiple checks - the longer the better -
> but then it means you would have to run iostat for like 30 seconds or
> so -> plugin runtime is 30 seconds too then! That means that check
> would have a high delay/latency, which is overall a bad idea. My
> solution so far is to run the plugin via cron and report output via
> nsca, this gave me the best results.

One thing you might want to consider is using sadc/sar for this job.
The sadc(8) program collects all kinds of stats on OS resource usage
and if you explicitly ask it, it will capture some interesting
disk i/o stats, including read/write requests per second.

One thing that would be handy to be able to alert on is the 
i/o wait statistic available from sar(1).  The man page defines
it as:

"Percentage of time that the CPU or CPUs were idle during which the
system had an outstanding disk I/O request."

So basically the plugin could just read the most recent data from
sadc by using sar and alert if the io/wait figure reached a certain
threshold.  You would want to adjust the sadc cron entry to run
about as often as your check interval.

Mike


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] State Stalking and notifications

2008-02-21 Thread Mike Hamrick
On Feb 20, 2008, at 7:46 AM, Frost, Mark {PBG} wrote:
> I had thought about writing a custom check for each line
> of output that this command generates, but that seems needlessly
> painful.

You could write one active check that parses the output, figures out  
what's gone wrong, and then submits passive results for the specific  
services that have errors.

Mike


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Announce: Monocole Oracle Monitoring Package

2008-02-14 Thread Mike Hamrick
Hello,

Blue Gecko Inc, is proud to announce the first release of Monocle, our
open source (GPLv2) Oracle 10g database monitoring package!  Monocle
mostly consists of a body of PL/SQL code that runs inside the database
as scheduled Oracle jobs.  When events occur inside the database that
are significant enough to alert on, the PL/SQL monitoring code notes
the problems in a monitoring events table.  The check_monocle script,
which is used as a connector between Monocole and Nagios, reads the
data from the events table and reports the information to Nagios using
the command pipe.

Requirements:

 * Oracle 10g or greater
 * Nagios 2.x
 * Oracle Instant Client (sqlplus)

Features:

 * Alert Log Monitor - Scans the alert Log and reports on exceptions
 * Backup Monitor - Monitors the health of Oracle RMAN backups
 * Job Monitor - Monitors Oracle DBA and Scheduler Jobs
 * Lock Monitor - Monitors and records info about blocking locks
 * Resource Monitor - Monitors resource consumption (cursors/ 
processes)
 * Space Monitor - Monitors tablespace/archive space consumption
 * Standby Monitor - Monitors physical/logical standby databases

To find out more about Monocle, visit our open source development  
site at:

 http://code.bluegecko.net

Direct download link:

 http://files.bluegecko.net/code/Monocle-1.0.tar.gz

About Blue Gecko:

What Blue Gecko does is simple: We provide database administration
support services for Oracle and MySQL.  The Blue Gecko team
proactively monitors, administers, and tunes Oracle and MySQL Server
database environments, either on our own Database Hosting Services
platform, or at our Client's site with Remote DBA Services.

To find out more about Blue Gecko, visit our services site at:

 http://www.bluegecko.net


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] linux kernel instrumentation + Nagios?

2007-10-14 Thread Mike Hamrick

Roger wrote:
> I'm looking for tools that will give Nagios some visibility inside the
> Linux kernel.

What are you trying to learn from the kernel?  I think it'd be handy
to have a monitor that would alert if a process started doing more
than a certain amount of block i/o operations.  Or perhaps a monitor
that alerted on disk i/o bandwidth utilization (iostat -x %util) could
use SystemTap to show you which processes would be likely culprits.

Mike



  

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive host results & soft states?

2007-04-25 Thread Mike Hamrick

Marco wrote:
> What I did is to send the passive host check through NSCA only if its
> in hard state, soft states are ignored, what script do you use to call
> send_nsca ?

Just a simple script that pipes "$HOST\t$RESULT\t$OUTPUT\n" into
send_nsca.  I'll need to also pass in $HOSTSTATETYPE$ and exit
if I don't see a HARD state. 

Thanks for the advice, this is a lot nicer way of dealing with
this problem.

Mike

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Passive host results & soft states?

2007-04-24 Thread Mike Hamrick

Thanks Marc, I found this answer from Ethan Galstad in the thread you posted:

> Nagios 2 doesn't support a max_attempts directive for hosts and all 
> passive host check results will immediately force the host into a HARD 
> state.  This has changed a bit in Nagios 3 - hosts do have a 
> max_attempts directive, but passive results still put the host into a 
> HARD state.

I think the solution for me is to change the check-host-alive command
to send more pings, that way one dropped packet on a remotely
monitored box won't cause me to get woken up at some ungodly hour.

The command_line I had for check-host-alive (not sure where I got it) 
seems somewhat silly to me:

Original:
$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

My Change:
$USER1$/check_ping -H $HOSTADDRESS$ -w 1000.0,40% -c 5000.0,100% -p 5

Mike

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Passive host results & soft states?

2007-04-24 Thread Mike Hamrick

Howdy,

Host A is a server which sends passive host/service results to host B
via NSCA. When a single host check fails on host C (a machine
monitored by host A) host A considers host C to be a SOFT state, where
host B (the one that actually sends notifications) considers host C to
be in a HARD state.  This causes me a lot of problems because often
just one host check will fail, yet I still get a notification.

Here are some log entries that illustrate this:

Log entry from host A:

Wed Mar 21 02:29:02 2007 HOST ALERT:
prod-mysql-1a;DOWN;SOFT;1;CRITICAL - Host Unreachable (prod-mysql-1a)

Log entry from host B:

Wed Mar 21 02:29:04 2007 EXTERNAL COMMAND:
PROCESS_HOST_CHECK_RESULT;prod-mysql-1a;1;CRITICAL - Host Unreachable 
(prod-mysql-1a)
Wed Mar 21 02:29:04 2007 HOST ALERT:
prod-mysql-1a;DOWN;HARD;1;CRITICAL - Host Unreachable (prod-mysql-1a)

Host A defines its hosts using this template:

define host {
nameserver
check_command   check-host-alive
failure_prediction_enabled  1
max_check_attempts  4
notification_period 24x7
freshness_threshold 250s
contact_groups  admins
notification_period 24x7
notification_interval   0
notification_optionsd,u,r
check_interval  4
register0
}

Host B defines its hosts using this template:

define host {
nameremote-host-template
active_checks_enabled   0
check_period24x7
max_check_attempts  4
notification_period 24x7
notification_interval   5
notification_optionsd,r
failure_prediction_enabled  1
check_command   service_is_stale
register0
}

Thanks for any help!

Mike

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Snmptrap with Nagios

2007-04-17 Thread Mike Hamrick

> I have to monitor a "thing" that works with snmptraps, but I don`t know 
> what I have to do.

You need to have a machine that listens for SNMP traps.  The program
snmptrapd does this, it comes with net-snmp package.  This daemon
writes the trap info to the system log, or alternately runs a 
program and passes it the trap information.  That program can then
submit a check to nagios using NSCA or by writing to the nagios.cmd
pipe.

I found this article to be very helpful:

http://www.samag.com/documents/s=9559/sam0503g/

Mike



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Which tool is best for me: Nagios, OpenNMS, or something else?

2007-03-31 Thread Mike Hamrick

I can't speak for OpenNMS, but I think for Nagios the answer for a lot
of your questions is going to be:

"There isn't a way of doing this with the standard nagios plugin
package, but someone has probably written a plugin that does this,
check the Nagios Exchange site."

> % Confirm each machine is up/pingable/reachable [obviously!]

Obviously.

> % nmap each machine to make sure correct ports (varies by machine) and
> no others are open

This isn't a standard nagios plugin, however somebody has a plugin that
does this, a quick google search found:

http://ubermonkey.wordpress.com/2006/09/28/nagios-nmap-plugin/

> % Not all tests all the time: some tests should run less frequently
> (reduce the load);

You can define the check_interval on a service by service basis.

> % For machines running httpd, download several pages, diff to last
> copies of these pages, report "big" differences...

I'm guessing you'll have to code this plugin yourself in nagios.

> % For machines running sendmail, send a test email to one of the other
> machines running sendmail, which then confirms receipt; alert if not
> received. Also do other mail routing/delivery tests.

This is becomming a frequently asked question on this list.  Various
people have written plugins to do this, but it's been my experience
that most people who need this end up writing their own.

> % For machines running popd/imapd, simulate login to confirm
> authentication is working (popd/imapd auth isn't always local for us)

See default answer.  A quick google search found this page, which
confirms authentication on pop/imap.

http://www.jhweiss.de/software/nagios.html

> % Monitor files in /etc (eg, passwd, shadow, crontab) for changes.

You could do this with tripwire and then write a plugin that reads
the snmp trap, or trap logfile.

> % Ideally, the "something bad has happened" reporting can be
> configured-- it may be OK for "mailq -v" to be large for 10-15
> minutes, but not for 30 minutes (for example).

You can do this with nagios.  You can check every five minutes and
not go to a hard failure state until the check has failed six times.

Mike
















-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Monitoring disk bandwidth utilization?

2007-01-28 Thread Mike Hamrick

Hugo van der Kooij wrote:

> I'm puzzled by this term of 'disk bandwitdh'. I am not quit sure we are on 
> the same wavelenght here. But I could imagine digging up the absolute 
> counters and using rrd to build the usual graphs out of them.

Sorry for not making this clear.  The iostat and sar utilities (part
of the RH sysstat package) will show you a statistic called either
iowait or %used for a given block device, which the manpage defines
as:

"Percentage of CPU time during which I/O requests were issued to the
device (bandwidth utilization for the device). Device saturation
occurs when this value is close to 100%."

> The issue I think is getting very frequent measurements and normalizing 
> them in some sort so you can obtain average and maximum values out of 
> them.

By default the sar utility will show you 10 minute averages, and
iostat can show you current utilizations.  

Mike

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Monitoring disk bandwidth utilization?

2007-01-28 Thread Mike Hamrick

Hi!

I've been unable to find a nagios plugin that monitors disk bandwidth
utilization, does anybody know of one?  It seems like it would be
relatively straightforward to wrap a nagios plugin around a utility
like iostat or sar, but I thought I'd ask if anyone had done this
before I dive in.

Thanks!

mikeh


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Separate notification_interval for warnings?

2007-01-07 Thread Mike Hamrick

Howdy,

I have nagios set up to send notifications every five minutes.  This
makes sense when a service is CRITICAL, but makes less sense when it
is simpily WARNING.  Warnings go to a separate email alias... every
five minutes.  Normally during the day I acknowledge them, but during
the evening they can generate quite a lot of spam.  I couldn't figure
out a good way to solve this problem, so I ended up adding a new
variable to nagios 2.6 called warn_notification_interval which only
gets applied to services/hosts in the WARNING state.

My question is, is this useful or is there an easier way to solve this
problem I just couldn't think of?  I'd be willing to create a patch
for my changes if anyone is interested.

Mike







-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null