Re: Nagios check_bioctl available

2006-07-31 Thread andrew fresh
On Fri, Jul 28, 2006 at 09:17:28PM -0500, Marco Peereboom wrote:
 andrew fresh wrote:
 I have written a perl script that parses the output from bioctl and
 returns it in a format that Nagios can use.  
 
 Sweet :-)

Thanks!

 One thing I ran into is that bioctl needs to run as root to get access
 to /dev/bio, even for read only access.  Is there a way to query bioctl
 without needing root?
 
 No!

dang! oh well, sudo is a good enough solution then.  

 Also, in biovar.h, both a raid volume and a disk can be Offline.
 However, I am not sure what that means.  Currently it is a WARNING, but
 I don't know what status it should be set to.
 
 If 2 or more physical disks of a RAID 5 are offline a volume will be 
 marked offline as well.  An offline RAID 5 is obviously a critical 
 event.  Hope this makes sense since I am not exactly sure what you are 
 asking.

I will change Offline to be a CRITICAL error.  

and here is the new version:
http://openbsd.somedomain.net/nagios/check_bioctl-1.4.tar.gz

However, I guess my question is what would cause a disk to be Offline?

There is a separate status for Failed, and I could see the RAID being
Offline if too many disks had Failed.


Are there any other status that should be different?  They seemed to be
fairly straight forward, but there may be good arguments for them to be
changed.

my %Status_Map = (
Online  = 'OK',
Offline = 'CRITICAL',
Degraded= 'CRITICAL',
Failed  = 'CRITICAL',
Building= 'WARNING',
Rebuild = 'WARNING',
'Hot spare' = 'OK',
Unused  = 'OK',
Scrubbing   = 'WARNING',
Invalid = 'CRITICAL',
);

l8rZ,
-- 
andrew - ICQ# 253198 - JID: [EMAIL PROTECTED]

BOFH excuse of the day: Windows 95 undocumented feature



Re: Nagios check_bioctl available

2006-07-31 Thread andrew fresh
On Sun, Jul 30, 2006 at 03:03:26AM +0200, Wijnand Wiersma wrote:
 2006/7/29, andrew fresh [EMAIL PROTECTED]:
 One thing I ran into is that bioctl needs to run as root to get access
 to /dev/bio, even for read only access.  Is there a way to query bioctl
 without needing root?
 
 Well, I think you only need the status of the drives and that is
 availlable using sysctl hw.sensors in current (you already mentioned
 sysctl). A monitoring system should not use the capabilities of
 bioctl, it just needs to know the status and report that.

If that is the case, then this check will become obsolete.  That would
be nice!  I will have to go put -current on my test box and try it out.  


As it is, on my 3.9-stable box, the output from sysctl if it is
available does not seem very reliable:

hw.sensors.29=esm0, Drive 0, drive, online
hw.sensors.30=esm0, Drive 1, drive, online
hw.sensors.31=esm0, Drive 2, drive, unknown
hw.sensors.32=esm0, Drive 3, drive, unknown
hw.sensors.33=esm0, Drive 4, drive, online
hw.sensors.34=esm0, Drive 5, drive, online
hw.sensors.35=esm0, Drive 6, drive, unknown
hw.sensors.36=esm0, Drive 7, drive, unknown

$ sudo bioctl ami0
Password:
Volume  Status Size   Device
 ami0 0 Online 8984199168 sd0 RAID1
  0 Online 8984199168 0:0.0   safte0 IBM DRVS09D 0140
  1 Online 8984199168 0:1.0   safte0 IBM DRVS09D 0140
 ami0 1 Online36234592256 sd1 RAID10
  0 Online18117296128 0:3.0   safte0 QUANTUM ATLAS10K2-TY184JDA40
  1 Online18117296128 0:4.0   safte0 QUANTUM ATLAS10K2-TY184JDA40
  2 Online18117296128 0:5.0   safte0 QUANTUM ATLAS10K2-TY184JDA40
  3 Online18117296128 0:8.0   safte0 QUANTUM ATLAS10K2-TY184JDA40
 ami0 2 Hot spare  8984199168 0:2.0   safte0 IBM DMVS09M 0220
 ami0 3 Hot spare 18117296128 0:9.0   safte0 QUANTUM ATLAS 10K 18SCA UCHD


The rest of the sensors seem mostly correct though, and there are sure
enough of them!

$ sysctl hw.sensors | tail -1
hw.sensors.99=safte0, temp1, OK, temp, 27.78 degC / 82.00 degF


Also, on another box that has external disk box connected with ses, I
don't get any status for those disks in sysctl.  The disks that are
actually in the server are using safte and those show up in sysctl.  I
don't know why, so now I have this check :-)


 Now that I think of it, I should add support to the upwatch monitoring
 system too, but I am not that lucky to have hardware to actually test
 it :-)

If the information is available in sysctl in 4.0, that would be the
check to integrate.

l8rZ,
-- 
andrew - ICQ# 253198 - JID: [EMAIL PROTECTED]

BOFH excuse of the day: dynamic software linking table corrupted



Re: Nagios check_bioctl available

2006-07-29 Thread Wijnand Wiersma

2006/7/29, andrew fresh [EMAIL PROTECTED]:

One thing I ran into is that bioctl needs to run as root to get access
to /dev/bio, even for read only access.  Is there a way to query bioctl
without needing root?


Well, I think you only need the status of the drives and that is
availlable using sysctl hw.sensors in current (you already mentioned
sysctl). A monitoring system should not use the capabilities of
bioctl, it just needs to know the status and report that.

Now that I think of it, I should add support to the upwatch monitoring
system too, but I am not that lucky to have hardware to actually test
it :-)

Wijnand



Nagios check_bioctl available

2006-07-28 Thread andrew fresh
I have written a perl script that parses the output from bioctl and
returns it in a format that Nagios can use.  

check_bioctl is avaliable here:
http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz

It is useful to me, and so I thought it might be useful to someone else.  

I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers
using the ami driver. It should work just fine on other versions of
OpenBSD as well as with other cards and drivers. If you do run into
trouble, send me the output from bioctl on the system you are having
trouble with and I can try to help. Patches to fix problems would be
even better.


One thing I ran into is that bioctl needs to run as root to get access
to /dev/bio, even for read only access.  Is there a way to query bioctl
without needing root?


Also, in biovar.h, both a raid volume and a disk can be Offline.
However, I am not sure what that means.  Currently it is a WARNING, but
I don't know what status it should be set to.

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup

If anyone knows what the Offline status means, I would sure like to
know.


An additional useful feature is that you can specify multiple devices to
check in a single check

/usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1


Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the
source (as it usually is) all output is on a single line separated with
br and it hides any devices that are OK because Nagios has a limit on
the length of a response.

CRITICAL (1):
   ami0 sd1 Degraded
WARNING (1):
   ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40
OK (7):
   ami0 sd0 Online
   ami0 0:0.0 Online IBM DMVS09M 0220
   ami0 0:1.0 Online IBM DRVS09D 0140
   ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:2.0 Hot spare IBM DRVS09D 0140


I currently configure it something like this:

$ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
/etc/sudoers:_nrpe   ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d 
ami0
/etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo 
/usr/local/libexec/nagios/check_bioctl -d ami0


Also available is check_hw_sensors for checking of sysctl hw.sensors
from Nagios.

http://openbsd.somedomain.net/nagios/

l8rZ,
-- 
andrew - ICQ# 253198 - JID: [EMAIL PROTECTED]

BOFH excuse of the day: YOU HAVE AN I/O ERROR - Incompetent Operator
error



Re: Nagios check_bioctl available

2006-07-28 Thread Marco Peereboom

andrew fresh wrote:

I have written a perl script that parses the output from bioctl and
returns it in a format that Nagios can use.  


Sweet :-)



check_bioctl is avaliable here:
http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz

It is useful to me, and so I thought it might be useful to someone else.  


I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers
using the ami driver. It should work just fine on other versions of
OpenBSD as well as with other cards and drivers. If you do run into
trouble, send me the output from bioctl on the system you are having
trouble with and I can try to help. Patches to fix problems would be
even better.


One thing I ran into is that bioctl needs to run as root to get access
to /dev/bio, even for read only access.  Is there a way to query bioctl
without needing root?


No!




Also, in biovar.h, both a raid volume and a disk can be Offline.
However, I am not sure what that means.  Currently it is a WARNING, but
I don't know what status it should be set to.


If 2 or more physical disks of a RAID 5 are offline a volume will be 
marked offline as well.  An offline RAID 5 is obviously a critical 
event.  Hope this makes sense since I am not exactly sure what you are 
asking.




http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup

If anyone knows what the Offline status means, I would sure like to
know.


An additional useful feature is that you can specify multiple devices to
check in a single check

/usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1


Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the
source (as it usually is) all output is on a single line separated with
br and it hides any devices that are OK because Nagios has a limit on
the length of a response.

CRITICAL (1):
   ami0 sd1 Degraded
WARNING (1):
   ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40
OK (7):
   ami0 sd0 Online
   ami0 0:0.0 Online IBM DMVS09M 0220
   ami0 0:1.0 Online IBM DRVS09D 0140
   ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:2.0 Hot spare IBM DRVS09D 0140


I currently configure it something like this:

$ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
/etc/sudoers:_nrpe   ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d 
ami0
/etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo 
/usr/local/libexec/nagios/check_bioctl -d ami0


Also available is check_hw_sensors for checking of sysctl hw.sensors
from Nagios.

http://openbsd.somedomain.net/nagios/

l8rZ,




Re: Nagios check_bioctl available

2006-07-28 Thread Marco Peereboom

andrew fresh wrote:

I have written a perl script that parses the output from bioctl and
returns it in a format that Nagios can use.  


Sweet :-)



check_bioctl is avaliable here:
http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz

It is useful to me, and so I thought it might be useful to someone else.  


I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers
using the ami driver. It should work just fine on other versions of
OpenBSD as well as with other cards and drivers. If you do run into
trouble, send me the output from bioctl on the system you are having
trouble with and I can try to help. Patches to fix problems would be
even better.


One thing I ran into is that bioctl needs to run as root to get access
to /dev/bio, even for read only access.  Is there a way to query bioctl
without needing root?


No!




Also, in biovar.h, both a raid volume and a disk can be Offline.
However, I am not sure what that means.  Currently it is a WARNING, but
I don't know what status it should be set to.


If 2 or more physical disks of a RAID 5 are offline a volume will be
marked offline as well.  An offline RAID 5 is obviously a critical
event.  Hope this makes sense since I am not exactly sure what you are
asking.



http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup

If anyone knows what the Offline status means, I would sure like to
know.


An additional useful feature is that you can specify multiple devices to
check in a single check

/usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1


Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the
source (as it usually is) all output is on a single line separated with
br and it hides any devices that are OK because Nagios has a limit on
the length of a response.

CRITICAL (1):
   ami0 sd1 Degraded
WARNING (1):
   ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40
OK (7):
   ami0 sd0 Online
   ami0 0:0.0 Online IBM DMVS09M 0220
   ami0 0:1.0 Online IBM DRVS09D 0140
   ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40
   ami0 0:2.0 Hot spare IBM DRVS09D 0140


I currently configure it something like this:

$ grep check_bioctl /etc/sudoers /etc/nrpe.cfg
/etc/sudoers:_nrpe   ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d 
ami0
/etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo 
/usr/local/libexec/nagios/check_bioctl -d ami0


Also available is check_hw_sensors for checking of sysctl hw.sensors
from Nagios.

http://openbsd.somedomain.net/nagios/

l8rZ,