Re: [Nagios-users] Problem with check_openmanage plugin and storage
Nic Bernstein writes: > We've recently been experimenting with Trond Hasle Amundsen's check_openmanage > on a large network with about a hundred Dell servers of various ages, > capabilities, etc. Mostly PE-2950, R210, R410 and R720. Much thanks to Trond > for all his great work on Nagios plugins and other projects, by the way. > > We've hit a wall, however, with the storage monitoring aspects of this plugin. > > For example, here's a quite specific case. This is a new PE R720, in debug: > > onlight@monitor:~$ check_openmanage -H host -C secret -d >System: PowerEdge R720 OMSA version:7.1.0 >ServiceTag: ### Plugin version: 3.7.9 >BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 > > - >Storage Components > > = > STATE |ID| MESSAGE TEXT > > -+--+ > OK |0 | Controller 0 [PERC H310 Mini] is Ready > WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] > on ctrl 0 is Online, Not Certified > WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] > on ctrl 0 is Online, Not Certified > OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is > Ready > OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready > OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready > OK |0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready > > - >Chassis Components > > = > STATE | ID | MESSAGE TEXT > > -+--+ > OK |0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok > OK |1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok > OK |2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok > OK |3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok > OK |0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM > OK |1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM > OK |2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM > OK |3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM > OK |4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM > OK |5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM > OK |0 | Power Supply 0 [AC]: Presence detected > OK |0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 > C (min=3/-7, max=42/47) > OK |1 | Temperature Probe 1 [System Board Exhaust Temp] reads > 33 C (min=8/3, max=70/75) > OK |2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, > max=83/88) > OK |0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present > OK |0 | Voltage sensor 0 [CPU1 VCORE PG] is Good > OK |1 | Voltage sensor 1 [System Board 3.3V PG] is Good > OK |2 | Voltage sensor 2 [System Board 5V PG] is Good > OK |3 | Voltage sensor 3 [CPU1 PLL PG] is Good > OK |4 | Voltage sensor 4 [System Board 1.1V PG] is Good > OK |5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good > OK |6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good > OK |7 | Voltage sensor 7 [System Board FETDRV PG] is Good > OK |8 | Voltage sensor 8 [CPU1 VSA PG] is Good > OK |9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good > OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good > OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good > OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good > OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good > OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good > OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good > OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good > OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V > OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence > Detected > OK |0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A > OK |1 | Amperage probe 1 [System Board Pwr Consumption] reads > 56 W > OK |0 | Chassis intrusion 0 detection: Ok (Not Breached) > OK |0 | SD Card 0 [vFlash] is Absent > > - >Other messages > > ===
[Nagios-users] Problem with check_openmanage plugin and storage
We've recently been experimenting with Trond Hasle Amundsen's check_openmanage on a large network with about a hundred Dell servers of various ages, capabilities, etc. Mostly PE-2950, R210, R410 and R720. Much thanks to Trond for all his great work on Nagios plugins and other projects, by the way. We've hit a wall, however, with the storage monitoring aspects of this plugin. For example, here's a quite specific case. This is a new PE R720, in debug: onlight@monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version:7.1.0 ServiceTag: ### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 - Storage Components = STATE |ID| MESSAGE TEXT -+--+ OK |0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK |0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready - Chassis Components = STATE | ID | MESSAGE TEXT -+--+ OK |0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK |1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK |2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK |3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK |0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM OK |1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK |2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK |3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK |4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK |5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK |0 | Power Supply 0 [AC]: Presence detected OK |0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK |1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK |2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK |0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK |0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK |1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK |2 | Voltage sensor 2 [System Board 5V PG] is Good OK |3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK |4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK |5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK |6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK |7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK |8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK |9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK |0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK |1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK |0 | Chassis intrusion 0 detection: Ok (Not Breached) OK |0 | SD Card 0 [vFlash] is Absent - Other messages
Re: [Nagios-users] check_ntp_time offset unknown
On 14/06/2013 15:10, Bennett, Jan wrote: We have implemented a NTP sync check in all of the NRDS checks that we are rolling out right now but I've run into a bit of a snag. I am getting returns of 'Offset Unknown' on all clients. It appears to only happen for a short period of time (30 min or so) and then it will clear its self up for a bit but the issue will always return. From the client that is reporting the unknown offset, I can run the following: # ./check_ntp_time -H localhost NTP CRITICAL: Offset unknown| # ./check_ntp_time -V check_ntp_time v1.4.16 (nagios-plugins 1.4.16) # ntpdc -p remote local st poll reach delay offsetdisp === =LOCAL(0)127.0.0.110 64 17 0.0 0.00 0.96858 *timeserver1 xxx.xxx.xxx.xxx2 64 17 0.00098 4.956048 0.00580 # /usr/local/nagios/libexec/check_ntp_time -v -H localhost sending request to peer 0 response from peer 0: offset -2.777669579e-07 sending request to peer 0 response from peer 0: offset -2.161832526e-07 sending request to peer 0 response from peer 0: offset -4.009343684e-07 sending request to peer 0 response from peer 0: offset -1.987209544e-07 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| In my searches, I noticed a number of people reporting the same issue with the supposed solution being to update your Nagios plugins to 1.4.13. I have done so and am now running 1.4.16 without any change in the service check. Also, I am unable to check a remote NTP server from these clients as they do not have access to the outside world. It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Any help would be greatly appreciated. I get this shortly after a NTP client has booted up. Once NTP has been running for a while it goes away. -- Regards, Giles Coochey, CCNP, CCNA, CCNAS NetSecSpec Ltd +44 (0) 7983 877438 http://www.coochey.net http://www.netsecspec.co.uk gi...@coochey.net smime.p7s Description: S/MIME Cryptographic Signature -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_ntp_time offset unknown
* Bennett, Jan [2013-06-14 14:10]: > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| Could you please run "ntpq -c rv" when this happens and post the output? > It has been suggested that the stratum=0 may be the culprit, but I'm not sure > of my options here. Yes, stratum=0 is the culprit. An NTP server wouldn't usually report such a stratum value. Holger -- Holger Weiß | Freie Universität Berlin hol...@zedat.fu-berlin.de | Zentraleinrichtung für Datenverarbeitung (ZEDAT) Telefon: +49 30 838-55949 | Fabeckstraße 32, 14195 Berlin (Germany) Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/ -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] check_ntp_time offset unknown
We have implemented a NTP sync check in all of the NRDS checks that we are rolling out right now but I've run into a bit of a snag. I am getting returns of 'Offset Unknown' on all clients. It appears to only happen for a short period of time (30 min or so) and then it will clear its self up for a bit but the issue will always return. >From the client that is reporting the unknown offset, I can run the following: # ./check_ntp_time -H localhost NTP CRITICAL: Offset unknown| # ./check_ntp_time -V check_ntp_time v1.4.16 (nagios-plugins 1.4.16) # ntpdc -p remote local st poll reach delay offsetdisp === =LOCAL(0)127.0.0.1 10 64 17 0.0 0.00 0.96858 *timeserver1 xxx.xxx.xxx.xxx2 64 17 0.00098 4.956048 0.00580 # /usr/local/nagios/libexec/check_ntp_time -v -H localhost sending request to peer 0 response from peer 0: offset -2.777669579e-07 sending request to peer 0 response from peer 0: offset -2.161832526e-07 sending request to peer 0 response from peer 0: offset -4.009343684e-07 sending request to peer 0 response from peer 0: offset -1.987209544e-07 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| In my searches, I noticed a number of people reporting the same issue with the supposed solution being to update your Nagios plugins to 1.4.13. I have done so and am now running 1.4.16 without any change in the service check. Also, I am unable to check a remote NTP server from these clients as they do not have access to the outside world. It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Any help would be greatly appreciated. Jan -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null