Re: [Nagios-users] Problem with check_openmanage plugin and storage

2013-06-18 Thread Trond Hasle Amundsen
Nic Bernstein  writes:

> We've recently been experimenting with Trond Hasle Amundsen's check_openmanage
> on a large network with about a hundred Dell servers of various ages,
> capabilities, etc.  Mostly PE-2950, R210, R410 and R720.  Much thanks to Trond
> for all his great work on Nagios plugins and other projects, by the way.
>
> We've hit a wall, however, with the storage monitoring aspects of this plugin.
>
> For example, here's a quite specific case.  This is a new PE R720, in debug:
>
> onlight@monitor:~$ check_openmanage -H host -C secret -d
>System:  PowerEdge R720   OMSA version:7.1.0
>ServiceTag:  ###  Plugin version:  3.7.9
>BIOS/date:   1.2.6 05/10/2012 Checking mode:   SNMPv2c UDP/IPv4
> 
> -
>Storage Components
> 
> =
>   STATE  |ID|  MESSAGE TEXT
> 
> -+--+
>   OK |0 | Controller 0 [PERC H310 Mini] is Ready
>  WARNING |  0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] 
> on ctrl 0 is Online, Not Certified
>  WARNING |  0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] 
> on ctrl 0 is Online, Not Certified
>   OK |  0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is 
> Ready
>   OK |  0:0 | Connector 0 [SAS] on controller 0 is Ready
>   OK |  0:1 | Connector 1 [SAS] on controller 0 is Ready
>   OK |0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready
> 
> -
>Chassis Components
> 
> =
>   STATE  |  ID  |  MESSAGE TEXT
> 
> -+--+
>   OK |0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
>   OK |1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
>   OK |2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
>   OK |3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
>   OK |0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM
>   OK |1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM
>   OK |2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM
>   OK |3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM
>   OK |4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM
>   OK |5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM
>   OK |0 | Power Supply 0 [AC]: Presence detected
>   OK |0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 
> C (min=3/-7, max=42/47)
>   OK |1 | Temperature Probe 1 [System Board Exhaust Temp] reads 
> 33 C (min=8/3, max=70/75)
>   OK |2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, 
> max=83/88)
>   OK |0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present
>   OK |0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
>   OK |1 | Voltage sensor 1 [System Board 3.3V PG] is Good
>   OK |2 | Voltage sensor 2 [System Board 5V PG] is Good
>   OK |3 | Voltage sensor 3 [CPU1 PLL PG] is Good
>   OK |4 | Voltage sensor 4 [System Board 1.1V PG] is Good
>   OK |5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
>   OK |6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
>   OK |7 | Voltage sensor 7 [System Board FETDRV PG] is Good
>   OK |8 | Voltage sensor 8 [CPU1 VSA PG] is Good
>   OK |9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
>   OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
>   OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
>   OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
>   OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
>   OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
>   OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
>   OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
>   OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
>   OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence 
> Detected
>   OK |0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
>   OK |1 | Amperage probe 1 [System Board Pwr Consumption] reads 
> 56 W
>   OK |0 | Chassis intrusion 0 detection: Ok (Not Breached)
>   OK |0 | SD Card 0 [vFlash] is Absent
> 
> -
>Other messages
> 
> ===

[Nagios-users] Problem with check_openmanage plugin and storage

2013-06-18 Thread Nic Bernstein
We've recently been experimenting with Trond Hasle Amundsen's
check_openmanage on a large network with about a hundred Dell servers of
various ages, capabilities, etc.  Mostly PE-2950, R210, R410 and R720. 
Much thanks to Trond for all his great work on Nagios plugins and other
projects, by the way.

We've hit a wall, however, with the storage monitoring aspects of this
plugin.

For example, here's a quite specific case.  This is a new PE R720, in debug:

onlight@monitor:~$ check_openmanage -H host -C secret -d
   System:  PowerEdge R720   OMSA version:7.1.0
   ServiceTag:  ###  Plugin version:  3.7.9
   BIOS/date:   1.2.6 05/10/2012 Checking mode:   SNMPv2c UDP/IPv4

-
   Storage Components   
 

=
  STATE  |ID|  MESSAGE TEXT 
 

-+--+
  OK |0 | Controller 0 [PERC H310 Mini] is Ready
 WARNING |  0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] 
on ctrl 0 is Online, Not Certified
 WARNING |  0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] 
on ctrl 0 is Online, Not Certified
  OK |  0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready
  OK |  0:0 | Connector 0 [SAS] on controller 0 is Ready
  OK |  0:1 | Connector 1 [SAS] on controller 0 is Ready
  OK |0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready

-
   Chassis Components   
 

=
  STATE  |  ID  |  MESSAGE TEXT 
 

-+--+
  OK |0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
  OK |1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
  OK |2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
  OK |3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
  OK |0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM
  OK |1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM
  OK |2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM
  OK |3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM
  OK |4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM
  OK |5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM
  OK |0 | Power Supply 0 [AC]: Presence detected
  OK |0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C 
(min=3/-7, max=42/47)
  OK |1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 
C (min=8/3, max=70/75)
  OK |2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, 
max=83/88)
  OK |0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present
  OK |0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
  OK |1 | Voltage sensor 1 [System Board 3.3V PG] is Good
  OK |2 | Voltage sensor 2 [System Board 5V PG] is Good
  OK |3 | Voltage sensor 3 [CPU1 PLL PG] is Good
  OK |4 | Voltage sensor 4 [System Board 1.1V PG] is Good
  OK |5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
  OK |6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
  OK |7 | Voltage sensor 7 [System Board FETDRV PG] is Good
  OK |8 | Voltage sensor 8 [CPU1 VSA PG] is Good
  OK |9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
  OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
  OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
  OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
  OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
  OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
  OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
  OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
  OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
  OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence 
Detected
  OK |0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
  OK |1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W
  OK |0 | Chassis intrusion 0 detection: Ok (Not Breached)
  OK |0 | SD Card 0 [vFlash] is Absent

-
   Other messages  

Re: [Nagios-users] check_ntp_time offset unknown

2013-06-18 Thread Giles Coochey

On 14/06/2013 15:10, Bennett, Jan wrote:


We have implemented a NTP sync check in all of the NRDS checks that we 
are rolling out right now but I've run into a bit of a snag.


I am getting returns of 'Offset Unknown' on all clients.  It appears 
to only happen for a short period of time (30 min or so) and then it 
will clear its self up for a bit but the issue will always return.


From the client that is reporting the unknown offset, I can run the 
following:


# ./check_ntp_time -H localhost
NTP CRITICAL: Offset unknown|
# ./check_ntp_time -V
check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
# ntpdc -p
 remote   local st poll reach  delay   offsetdisp
===
=LOCAL(0)127.0.0.110   64   17 0.0  0.00 0.96858
*timeserver1  xxx.xxx.xxx.xxx2   64   17 0.00098  4.956048 0.00580

# /usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.777669579e-07
sending request to peer 0
response from peer 0: offset -2.161832526e-07
sending request to peer 0
response from peer 0: offset -4.009343684e-07
sending request to peer 0
response from peer 0: offset -1.987209544e-07
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|

In my searches, I noticed a number of people reporting the same issue 
with the supposed solution being to update your Nagios plugins to 
1.4.13.  I have done so and am now running 1.4.16 without any change 
in the service check.


Also, I am unable to check a remote NTP server from these clients as 
they do not have access to the outside world.


It has been suggested that the stratum=0 may be the culprit, but I'm 
not sure of my options here.


Any help would be greatly appreciated.


I get this shortly after a NTP client has booted up. Once NTP has been 
running for a while it goes away.


--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
gi...@coochey.net



smime.p7s
Description: S/MIME Cryptographic Signature
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_ntp_time offset unknown

2013-06-18 Thread Holger Weiß
* Bennett, Jan  [2013-06-14 14:10]:
> # ./check_ntp_time -H localhost
> NTP CRITICAL: Offset unknown|

Could you please run "ntpq -c rv" when this happens and post the output?

> It has been suggested that the stratum=0 may be the culprit, but I'm not sure 
> of my options here.

Yes, stratum=0 is the culprit.  An NTP server wouldn't usually report
such a stratum value.

Holger

-- 
Holger Weiß   | Freie Universität Berlin
hol...@zedat.fu-berlin.de | Zentraleinrichtung für Datenverarbeitung (ZEDAT)
Telefon: +49 30 838-55949 | Fabeckstraße 32, 14195 Berlin (Germany)
Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] check_ntp_time offset unknown

2013-06-18 Thread Bennett, Jan
We have implemented a NTP sync check in all of the NRDS checks that we are 
rolling out right now but I've run into a bit of a snag.

I am getting returns of 'Offset Unknown' on all clients.  It appears to only 
happen for a short period of time (30 min or so) and then it will clear its 
self up for a bit but the issue will always return.

>From the client that is reporting the unknown offset, I can run the following:

# ./check_ntp_time -H localhost
NTP CRITICAL: Offset unknown|
# ./check_ntp_time -V
check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
# ntpdc -p
 remote   local  st poll reach  delay   offsetdisp
===
=LOCAL(0)127.0.0.1   10   64   17 0.0  0.00 0.96858
*timeserver1 xxx.xxx.xxx.xxx2   64   17 0.00098  4.956048 0.00580
# /usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.777669579e-07
sending request to peer 0
response from peer 0: offset -2.161832526e-07
sending request to peer 0
response from peer 0: offset -4.009343684e-07
sending request to peer 0
response from peer 0: offset -1.987209544e-07
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|

In my searches, I noticed a number of people reporting the same issue with the 
supposed solution being to update your Nagios plugins to 1.4.13.  I have done 
so and am now running 1.4.16 without any change in the service check.

Also, I am unable to check a remote NTP server from these clients as they do 
not have access to the outside world.

It has been suggested that the stratum=0 may be the culprit, but I'm not sure 
of my options here.

Any help would be greatly appreciated.

Jan

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null