Re: Nagios check_bioctl available
On Fri, Jul 28, 2006 at 09:17:28PM -0500, Marco Peereboom wrote: andrew fresh wrote: I have written a perl script that parses the output from bioctl and returns it in a format that Nagios can use. Sweet :-) Thanks! One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? No! dang! oh well, sudo is a good enough solution then. Also, in biovar.h, both a raid volume and a disk can be Offline. However, I am not sure what that means. Currently it is a WARNING, but I don't know what status it should be set to. If 2 or more physical disks of a RAID 5 are offline a volume will be marked offline as well. An offline RAID 5 is obviously a critical event. Hope this makes sense since I am not exactly sure what you are asking. I will change Offline to be a CRITICAL error. and here is the new version: http://openbsd.somedomain.net/nagios/check_bioctl-1.4.tar.gz However, I guess my question is what would cause a disk to be Offline? There is a separate status for Failed, and I could see the RAID being Offline if too many disks had Failed. Are there any other status that should be different? They seemed to be fairly straight forward, but there may be good arguments for them to be changed. my %Status_Map = ( Online = 'OK', Offline = 'CRITICAL', Degraded= 'CRITICAL', Failed = 'CRITICAL', Building= 'WARNING', Rebuild = 'WARNING', 'Hot spare' = 'OK', Unused = 'OK', Scrubbing = 'WARNING', Invalid = 'CRITICAL', ); l8rZ, -- andrew - ICQ# 253198 - JID: [EMAIL PROTECTED] BOFH excuse of the day: Windows 95 undocumented feature
Re: Nagios check_bioctl available
On Sun, Jul 30, 2006 at 03:03:26AM +0200, Wijnand Wiersma wrote: 2006/7/29, andrew fresh [EMAIL PROTECTED]: One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? Well, I think you only need the status of the drives and that is availlable using sysctl hw.sensors in current (you already mentioned sysctl). A monitoring system should not use the capabilities of bioctl, it just needs to know the status and report that. If that is the case, then this check will become obsolete. That would be nice! I will have to go put -current on my test box and try it out. As it is, on my 3.9-stable box, the output from sysctl if it is available does not seem very reliable: hw.sensors.29=esm0, Drive 0, drive, online hw.sensors.30=esm0, Drive 1, drive, online hw.sensors.31=esm0, Drive 2, drive, unknown hw.sensors.32=esm0, Drive 3, drive, unknown hw.sensors.33=esm0, Drive 4, drive, online hw.sensors.34=esm0, Drive 5, drive, online hw.sensors.35=esm0, Drive 6, drive, unknown hw.sensors.36=esm0, Drive 7, drive, unknown $ sudo bioctl ami0 Password: Volume Status Size Device ami0 0 Online 8984199168 sd0 RAID1 0 Online 8984199168 0:0.0 safte0 IBM DRVS09D 0140 1 Online 8984199168 0:1.0 safte0 IBM DRVS09D 0140 ami0 1 Online36234592256 sd1 RAID10 0 Online18117296128 0:3.0 safte0 QUANTUM ATLAS10K2-TY184JDA40 1 Online18117296128 0:4.0 safte0 QUANTUM ATLAS10K2-TY184JDA40 2 Online18117296128 0:5.0 safte0 QUANTUM ATLAS10K2-TY184JDA40 3 Online18117296128 0:8.0 safte0 QUANTUM ATLAS10K2-TY184JDA40 ami0 2 Hot spare 8984199168 0:2.0 safte0 IBM DMVS09M 0220 ami0 3 Hot spare 18117296128 0:9.0 safte0 QUANTUM ATLAS 10K 18SCA UCHD The rest of the sensors seem mostly correct though, and there are sure enough of them! $ sysctl hw.sensors | tail -1 hw.sensors.99=safte0, temp1, OK, temp, 27.78 degC / 82.00 degF Also, on another box that has external disk box connected with ses, I don't get any status for those disks in sysctl. The disks that are actually in the server are using safte and those show up in sysctl. I don't know why, so now I have this check :-) Now that I think of it, I should add support to the upwatch monitoring system too, but I am not that lucky to have hardware to actually test it :-) If the information is available in sysctl in 4.0, that would be the check to integrate. l8rZ, -- andrew - ICQ# 253198 - JID: [EMAIL PROTECTED] BOFH excuse of the day: dynamic software linking table corrupted
Re: Nagios check_bioctl available
2006/7/29, andrew fresh [EMAIL PROTECTED]: One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? Well, I think you only need the status of the drives and that is availlable using sysctl hw.sensors in current (you already mentioned sysctl). A monitoring system should not use the capabilities of bioctl, it just needs to know the status and report that. Now that I think of it, I should add support to the upwatch monitoring system too, but I am not that lucky to have hardware to actually test it :-) Wijnand
Nagios check_bioctl available
I have written a perl script that parses the output from bioctl and returns it in a format that Nagios can use. check_bioctl is avaliable here: http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz It is useful to me, and so I thought it might be useful to someone else. I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers using the ami driver. It should work just fine on other versions of OpenBSD as well as with other cards and drivers. If you do run into trouble, send me the output from bioctl on the system you are having trouble with and I can try to help. Patches to fix problems would be even better. One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? Also, in biovar.h, both a raid volume and a disk can be Offline. However, I am not sure what that means. Currently it is a WARNING, but I don't know what status it should be set to. http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup If anyone knows what the Offline status means, I would sure like to know. An additional useful feature is that you can specify multiple devices to check in a single check /usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1 Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the source (as it usually is) all output is on a single line separated with br and it hides any devices that are OK because Nagios has a limit on the length of a response. CRITICAL (1): ami0 sd1 Degraded WARNING (1): ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40 OK (7): ami0 sd0 Online ami0 0:0.0 Online IBM DMVS09M 0220 ami0 0:1.0 Online IBM DRVS09D 0140 ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:2.0 Hot spare IBM DRVS09D 0140 I currently configure it something like this: $ grep check_bioctl /etc/sudoers /etc/nrpe.cfg /etc/sudoers:_nrpe ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0 /etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo /usr/local/libexec/nagios/check_bioctl -d ami0 Also available is check_hw_sensors for checking of sysctl hw.sensors from Nagios. http://openbsd.somedomain.net/nagios/ l8rZ, -- andrew - ICQ# 253198 - JID: [EMAIL PROTECTED] BOFH excuse of the day: YOU HAVE AN I/O ERROR - Incompetent Operator error
Re: Nagios check_bioctl available
andrew fresh wrote: I have written a perl script that parses the output from bioctl and returns it in a format that Nagios can use. Sweet :-) check_bioctl is avaliable here: http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz It is useful to me, and so I thought it might be useful to someone else. I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers using the ami driver. It should work just fine on other versions of OpenBSD as well as with other cards and drivers. If you do run into trouble, send me the output from bioctl on the system you are having trouble with and I can try to help. Patches to fix problems would be even better. One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? No! Also, in biovar.h, both a raid volume and a disk can be Offline. However, I am not sure what that means. Currently it is a WARNING, but I don't know what status it should be set to. If 2 or more physical disks of a RAID 5 are offline a volume will be marked offline as well. An offline RAID 5 is obviously a critical event. Hope this makes sense since I am not exactly sure what you are asking. http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup If anyone knows what the Offline status means, I would sure like to know. An additional useful feature is that you can specify multiple devices to check in a single check /usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1 Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the source (as it usually is) all output is on a single line separated with br and it hides any devices that are OK because Nagios has a limit on the length of a response. CRITICAL (1): ami0 sd1 Degraded WARNING (1): ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40 OK (7): ami0 sd0 Online ami0 0:0.0 Online IBM DMVS09M 0220 ami0 0:1.0 Online IBM DRVS09D 0140 ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:2.0 Hot spare IBM DRVS09D 0140 I currently configure it something like this: $ grep check_bioctl /etc/sudoers /etc/nrpe.cfg /etc/sudoers:_nrpe ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0 /etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo /usr/local/libexec/nagios/check_bioctl -d ami0 Also available is check_hw_sensors for checking of sysctl hw.sensors from Nagios. http://openbsd.somedomain.net/nagios/ l8rZ,
Re: Nagios check_bioctl available
andrew fresh wrote: I have written a perl script that parses the output from bioctl and returns it in a format that Nagios can use. Sweet :-) check_bioctl is avaliable here: http://openbsd.somedomain.net/nagios/check_bioctl-1.3.tar.gz It is useful to me, and so I thought it might be useful to someone else. I wrote this on OpenBSD 3.9 and tested on Dell PERC 3/DC controllers using the ami driver. It should work just fine on other versions of OpenBSD as well as with other cards and drivers. If you do run into trouble, send me the output from bioctl on the system you are having trouble with and I can try to help. Patches to fix problems would be even better. One thing I ran into is that bioctl needs to run as root to get access to /dev/bio, even for read only access. Is there a way to query bioctl without needing root? No! Also, in biovar.h, both a raid volume and a disk can be Offline. However, I am not sure what that means. Currently it is a WARNING, but I don't know what status it should be set to. If 2 or more physical disks of a RAID 5 are offline a volume will be marked offline as well. An offline RAID 5 is obviously a critical event. Hope this makes sense since I am not exactly sure what you are asking. http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/biovar.h?rev=1.25content-type=text/x-cvsweb-markup If anyone knows what the Offline status means, I would sure like to know. An additional useful feature is that you can specify multiple devices to check in a single check /usr/local/libexec/nagios/check_bioctl -d ami0 -d ami1 Output is similar to below, except with NAGIOS_OUTPUT set to 1 in the source (as it usually is) all output is on a single line separated with br and it hides any devices that are OK because Nagios has a limit on the length of a response. CRITICAL (1): ami0 sd1 Degraded WARNING (1): ami0 0:8.0 Rebuild QUANTUM ATLAS10K2-TY184JDA40 OK (7): ami0 sd0 Online ami0 0:0.0 Online IBM DMVS09M 0220 ami0 0:1.0 Online IBM DRVS09D 0140 ami0 0:3.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:4.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:5.0 Online QUANTUM ATLAS10K2-TY184JDA40 ami0 0:2.0 Hot spare IBM DRVS09D 0140 I currently configure it something like this: $ grep check_bioctl /etc/sudoers /etc/nrpe.cfg /etc/sudoers:_nrpe ALL = NOPASSWD:/usr/local/libexec/nagios/check_bioctl -d ami0 /etc/nrpe.cfg:command[check_bioctl]=/usr/bin/sudo /usr/local/libexec/nagios/check_bioctl -d ami0 Also available is check_hw_sensors for checking of sysctl hw.sensors from Nagios. http://openbsd.somedomain.net/nagios/ l8rZ,