Re: [Nagios-users] check_Openmanage trouble

2013-08-15 Thread Trond Hasle Amundsen
"Weberskirch, Timo"  writes:

> the check_openmanage –no-storage options works (surely without any physical 
> disk… :( ).
>
> I was on the phone with the Dell Pro Support. They told me that the MD3 
> only schows the raid disk Information (not the physical
> disk informations) to external devices.
>
> Also they told me that there is no way to filter out the SAS-Card in OMSA.
>
> I have to live with „—no-storage“ option…

Hmm.. Ok, so this particular server doesn't have any storage other than
the SAS card (connected to the MD3xxx), which OMSA can't manage? If so,
that is exactly what the '--no-storage' option is for :)

You should use the '--no-storage' option if

  1. The server has no storage, which is entirely possible; or
  2. The only storage present is something that OMSA doesn't recognize

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_Openmanage trouble

2013-08-14 Thread Trond Hasle Amundsen
"Weberskirch, Timo"  writes:

> thank you all for your fast and helpful response.  Unfortunately the problem
> persists.
>
> Is there a way to filter out the  (in my opinion faulty) SAS card?

Storage components are tightly interconnected, so from the plugin side
your only option is to not check storage at all:

   check_openmanage --no-storage

But I still believe that this is a software problem, i.e. in OMSA.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo


--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_Openmanage trouble

2013-08-07 Thread Trond Hasle Amundsen
Rich  writes:

> Usually, when I've seen this, it's been after doing an upgrade of an
> existing OMSA install (<= 6.x to 7.x).
>
> In general, I haven't found a good way to resolve it other than
> automating a complete uninstall of OMSA prior to installing the newer
> version.

Yes, I think the logical next step in this case is to do a complete
uninstall, then reinstall of OMSA on the host. The problem is in OMSA
and must be fixed there. The plugin is simply complaining that OMSA
isn't responding as expected.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_Openmanage trouble

2013-08-06 Thread Trond Hasle Amundsen
"Weberskirch, Timo"  writes:

> maybe one of you has the same problem with the check_openmanage plugin…
>
> Last week we installed two new Dell PowerEdge R720 with OMSA v 7.3.0
> (check_openmange version: 3.7.10).
>
> Everytime I try to check my Server I get this error message:
>
> “SNMP ERROR [storage / pdisk]: Requested entries are empty or do not exist.”

Hello Timo,

There seems to be some sort of issue with the Openmanage installation on
this server. First thing to do is double-check that everything is
installed properly. On a RHEL6 system, the following storage related RPM
packages should be installed:

  # rpm -qa|grep srvadmin-storage
  srvadmin-storageservices-7.3.0-4.4.1.el6.x86_64
  srvadmin-storage-7.3.0-4.93.2.el6.x86_64
  srvadmin-storage-cli-7.3.0-4.93.2.el6.x86_64
  srvadmin-storageservices-snmp-7.3.0-4.4.1.el6.x86_64
  srvadmin-storage-snmp-7.3.0-4.93.2.el6.x86_64
  srvadmin-storageservices-cli-7.3.0-4.4.1.el6.x86_64

Do you see any physical disks in the Openmanage Web Console? (point your
browser to https://:1311/ and log in as root)

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] rpmbuild nagios-3.5.0

2013-07-24 Thread Trond Hasle Amundsen
alexus  writes:

> I'm unable to build RPM w/ nagios 3.5.0, last one that worked for me was 
> 3.2.3.
> any ideas/suggestions?

I'd recommend using the already prebuilt package for rhel6 which is
available from EPEL[1]. Add the EPEL repo and you can simply do "yum
install nagios" and be done :)

[1] http://fedoraproject.org/wiki/EPEL

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage improvement request

2013-07-23 Thread Trond Hasle Amundsen
"John Skarbek"  writes:

> I?ve recently deployed the check_openmanage script and it works very well. 
> Except for hosts that run esxi.  Unless I?m doing something wrong. 

You're not doing anything wrong. Openmanage, when deployed on ESXi,
doesn't have the necessary capabilities for it to work.

> I?ve discovered that Open Manage doesn?t broadcast it?s OID?s through ESXi 
> like
> it would if it were a linux or windows host.  However I did find that the
> iDRAC7 does have similar snmp responses that I?d like to capture.  However 
> when
> pointing check_openmanage to the drac interface, I get the message indicating
> that OMSA must not be installed correctly.  However, looking into the script I
> found:
>
> my $chassisModelName = '1.3.6.1.4.1.674.10892.1.300.10.1.9.1';
>
> Which does indeed NOT exist.  However, a similar OID with the same information
> we are looking for is located here:
>
>$chassisModelName = '1.3.6.1.4.1.674.10892.5.1.3.12.0';

Actually, the OID is 1.3.6.1.4.1.674.10892.5.4.300.10.1.9.1. I've toyed
around with this a bit, and for the most part you can simply replace
"1.3.6.1.4.1.674.10892.1" with "1.3.6.1.4.1.674.10892.5.4". Same goes for
storage OIDs, to a degree.

> After modifying the script a little bit I was able to get past that, but now
> check_openamange is complaining, ?SNMP ERROR [memory]: The requested entries
> are empty or do not exist. ?
>
> I presume the entire set of OID?s is in a different spot when being checked
> through the drac versus the standard windows snmp service.  I would love to
> assist in enhancing this script, but I?m not sure how I should start.  Let me
> know who I should contact, or feel free to reach out to me to assist with this
> awesome plugin.

I have a modified prealpha version for testing, available in the test
branch in git:

  http://git.uio.no/git/?p=check_openmanage.git;a=shortlog;h=refs/heads/test

Note that it's NOT production ready, I have only done some very limited
testing.

I had to simplify some stuff:

  * Storage: The storage OIDs from the iDRAC7 are somewhat different,
compared to Openmanage. Some information that the plugin needs is
not available, such as numbered identifiers for components (used in
blacklisting). There are even some OIDs that aren't present in
Openmanage. In short, it's a mess, and the storage bit is very
simplistic. Perhaps the missing info will be added in a later
firmware release, we can only hope.

  * ESM health OIDs are missing completely, so ESM health check is
omitted. Same for SD card check.

To use the new feature you have to specify '--idrac', like this:

  check_openmanage --idrac -H 

Test it, break it and tell me what you think :)

I've noticed that neither the rollup-status or component-status for
controllers catches that the controller is actually degraded from
out-of-date firmware. Hopefully it's an anomaly that doesn't apply to
other aspects of controllers, or other components.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios openmanage ERROR: XML transformation failed

2013-06-19 Thread Trond Hasle Amundsen
"Lorenz, Stephan"  writes:

> since installing libxml2, libxml2-devel and curl, the Nagios installation on
> our Dell R720xd server reports XML errors.
>
>  
>
> Problem running 'omreport storage controller': Error! XML Transformation 
> failed
> Problem running 'omreport chassis memory': Error! XML Transformation
> failedProblem running 'omreport chassis fans': Error! XML Transformation
> failedProblem running 'omreport chassis pwrsupplies': Error! XML
> Transformation failedProblem running 'omreport chassis temps': Error! XML
> Transformation failedProblem running 'omreport chassis processors': 
> Error!
> XML Transformation failedProblem running 'omreport chassis volts': Error!
> XML Transformation failedProblem running 'omreport chassis batteries':
> Error! XML Transformation failedProblem running 'omreport chassis
> pwrmonitoring': Error! XML Transformation failedProblem running 'omreport
> chassis intrusion': Error! XML Transformation failedProblem running
> 'omreport chassis removableflashmedia': Error! XML Transformation failed
> Chassis Service Tag is bogus: 'N/A'
>
>  
>
> I am using Nagios 3.5.1, check_openmanage 3.7.9, Openmanage 7.2.0 on Centos 
> 6.4
> 2.6.32-358.11.1.el6.centos.plus.x86_64.
>
>  
>
> When I run check_openmanage or omreport manually everything is fine. I tried 
> to
> reinstall nagios-plugins-openmanage and php-xml for a start, but that did not
> help. I cannot remove libxml2 and the rest since it is needed elsewhere.
>
>  
>
> Does anyone have a suggestion of how to fix this error?

Given that it works when you run the commands manually I'm suspecting
some sort of permission issue. Try running the commands as the NRPE
user, and also try running it from Nagios with SELinux in permissive
mode (needs to be run by the NRPE daemon with the correct SELinux
domain).

Check out this link about using check_openmanage with SELinux in
enforcing mode:

  
http://folk.uio.no/trondham/software/check_openmanage.html#selinux-considerations

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage plugin and storage

2013-06-19 Thread Trond Hasle Amundsen
Nic Bernstein  writes:

> Regarding the non-certified disks problem... There is a special
> blacklisting keyword to suppress the message about non-certified disks:
>
>   check_openmanage -b pdisk_cert=all
>
> Please try this and see if it resolves your issue. Using blacklisting
> should also disable the global health check.
>
>
> Ah, that's just what we need.  Much appreciated...
>
> No, that doesn't seem to be in my version (3.7.9, downloaded yesterday)
>
> onlight@monitor:~$ perl check_openmanage -H host -C secret -b 
> pdisk_cert=all
> Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online
> Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online
> onlight@monitor:~$ echo $?
> 1
>
> I guess I'll wait for a patch.

Are you sure you didn't test this with the 7.1.0 workaround manually
removed?

> Say Trond, I sent you some notes last week about enhancements we made to your
> check_linux_bonding plugin.  Would you prefer I re-post those to the list
> instead?

Sorry for being non-responsive of late. I've been swamped at work lately
and have attained somewhat of an email backlog. No need to resend :)

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage plugin and storage

2013-06-18 Thread Trond Hasle Amundsen
Nic Bernstein  writes:

> We've recently been experimenting with Trond Hasle Amundsen's check_openmanage
> on a large network with about a hundred Dell servers of various ages,
> capabilities, etc.  Mostly PE-2950, R210, R410 and R720.  Much thanks to Trond
> for all his great work on Nagios plugins and other projects, by the way.
>
> We've hit a wall, however, with the storage monitoring aspects of this plugin.
>
> For example, here's a quite specific case.  This is a new PE R720, in debug:
>
> onlight@monitor:~$ check_openmanage -H host -C secret -d
>System:  PowerEdge R720   OMSA version:7.1.0
>ServiceTag:  ###  Plugin version:  3.7.9
>BIOS/date:   1.2.6 05/10/2012 Checking mode:   SNMPv2c UDP/IPv4
> 
> -
>Storage Components
> 
> =
>   STATE  |ID|  MESSAGE TEXT
> 
> -+--+
>   OK |0 | Controller 0 [PERC H310 Mini] is Ready
>  WARNING |  0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] 
> on ctrl 0 is Online, Not Certified
>  WARNING |  0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] 
> on ctrl 0 is Online, Not Certified
>   OK |  0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is 
> Ready
>   OK |  0:0 | Connector 0 [SAS] on controller 0 is Ready
>   OK |  0:1 | Connector 1 [SAS] on controller 0 is Ready
>   OK |0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready
> 
> -
>Chassis Components
> 
> =
>   STATE  |  ID  |  MESSAGE TEXT
> 
> -+--+
>   OK |0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
>   OK |1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
>   OK |2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
>   OK |3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
>   OK |0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM
>   OK |1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM
>   OK |2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM
>   OK |3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM
>   OK |4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM
>   OK |5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM
>   OK |0 | Power Supply 0 [AC]: Presence detected
>   OK |0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 
> C (min=3/-7, max=42/47)
>   OK |1 | Temperature Probe 1 [System Board Exhaust Temp] reads 
> 33 C (min=8/3, max=70/75)
>   OK |2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, 
> max=83/88)
>   OK |0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present
>   OK |0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
>   OK |1 | Voltage sensor 1 [System Board 3.3V PG] is Good
>   OK |2 | Voltage sensor 2 [System Board 5V PG] is Good
>   OK |3 | Voltage sensor 3 [CPU1 PLL PG] is Good
>   OK |4 | Voltage sensor 4 [System Board 1.1V PG] is Good
>   OK |5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
>   OK |6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
>   OK |7 | Voltage sensor 7 [System Board FETDRV PG] is Good
>   OK |8 | Voltage sensor 8 [CPU1 VSA PG] is Good
>   OK |9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
>   OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
>   OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
>   OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
>   OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
>   OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
>   OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
>   OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
>   OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
>   OK |0 | Battery probe 0 [System Board CMOS Battery] is Presence 
> Detected
>   OK |0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
>   OK |1 | Amperage probe 1 [System Board Pwr Consumption] reads 
> 56 W
>   OK |0 | Chassis intrusion 0 detection: Ok (Not Breached)
>   OK |0 | SD Card 0 [vFlash] is Absent
> 
> -
>Other messages
> 
> ===

Re: [Nagios-users] Check_Openmanage not ignoring non-certified drives

2013-01-14 Thread Trond Hasle Amundsen
"Bob The Junkie"  writes:

> I m using Nagios and Check_Openmange to keep an eye on some Dell R710 servers
> we ve recently acquired, and I m having problems trying to stop warnings with
> non-dell certified drives appearing in the alert log.
>
> I ve separated out the different components on the servers to check into their
> own nagios checks   so my config files appear as such:
>
> In nagios:
>
> SERVICES.CFG
>
> 
>
> Check_command check_dell_components!memory
>
> 
>
> Check_command check_dell_components!alertlog
>
> COMMANDS.CFG
>
> Command_name Check_dell_components
>
> Command_line check_nrpe  H $HOSTADDRESS$ -p 5666  t 30  c Check_OpenManage  a 
>  
>  only $ARG1$ 
>
> On each Server in nsclient.ini:
>
> Check_OpenManage = scripts\\check_openmanage.exe $ARG1$ --perfdata
>
> The problem I m having is that in one of my checks that checks the health of
> the alert log, I m getting a consistent warning message (Alert log content: 0
> critical, 6 non-critical, 36 ok ). I ve traced this down to the 6 non-dell
> certified drives in the server, and I can indeed see within OMSA that the only
> 6 warnings all state  Controller event log: PD 04(e0x20/s4) is not a certified
> drive: Controller 0 (PERC 6/i Integrated) .
>
> So far, so good. Reading through the documentation I can see the
> Check_Openmanage includes a blacklisting option specifically for this event  
> pdisk_cert - Suppress warning message about non-certified physical disk  but 
> no
> matter what I try, I can t seem to get Check_Openmanage to ignore these
> problems. An example of the command I m running on the command line is:
>
> check_openmanage.exe -s -a -b pdisk_cert=all
>
> Which returns:
>
> WARNING: Alert log content: 0 critical, 6 non-critical, 36 ok
>
> Now I m assuming the problem here is being caused by the Alert Log generating
> the errors, and not the physical disk directly causing the errors, which is 
> why
> blacklisting the certificate problem on the physical disk isn t doing me any
> good.
>
> Which leads me onto my question   is there anything I can do to ignore these
> errors (and thus stop Nagios from complaining) apart from excluding the alert
> log when I do my checks?

Hi,

Your analysis is correct. The check_openmanage plugin's check of the log
content is limited to counting the number of critical, warning and ok
messages. It doesn't do any log parsing. The intended usage of the log
checking is as a precausion, if you're concerned about missing some
temporary problem. After all, the plugin does active checking and will
only report the state of the hardware right now.

In your case I think that the easiest solution would be to stop using
the log checking with check_openmanage, and either use a fully fledged
log parsing plugin (such as check_logfiles) or write your own simple
plugin where you just filter out the certificate stuff.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] New check_openmanage error after updating to OMSA 7.2.0-4

2013-01-10 Thread Trond Hasle Amundsen
Steve Jenkins  writes:

> And... to answer my own question, yes - 3.7.9 does indeed fix
> this. New version is probably already in the repos, waiting out the
> testing period.

Not sure which repos you're referring to, but I'll assume Fedora and/or
Fedora EPEL.

I didn't get around to submitting updates until today. They should
arrive in the testing repos in a couple of days. The updates need to
stay in testing for a week for Fedora and two weeks for EPEL before they
can be pushed to stable. If you can't wait, you can download the RPMs
via the Fedora build system, you'll find links here:

  https://admin.fedoraproject.org/updates/search/nagios-plugins-openmanage

When it has arrived in testing (and in your local mirror), you can
install it with (example for EPEL):

  yum --enablerepo=epel-testing update nagios-plugins-openmanage

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: timeout vs. SNMP timeout

2012-12-11 Thread Trond Hasle Amundsen
Andrew Daugherity  writes:

>> Please try this version (named 3.7.8-beta2) and let me know if it works
>> around your problem. Usage:
>> 
>>   check_openmange --snmp-timeout 
>
> I think I fixed my problem (for the time being at least) by restarting
> OMSA on that server.  Restarting snmpd didn't solve anything, nor did
> my timeout hack (which just gave me an UNKNOWN status - plugin timeout
> instead of SNMP CRITICAL when it randomly failed).  Whenever the check
> failed, it would hang indefinitely, so it was not a case of slow SNMP.
> Thanks for the added option, though; I think someone may find it
> useful.

Yes, I agree. I'll keep it.

> Regarding your fix:
> The timeout option does appear to get passed to SNMP, however the
> actual timeout is twice what is specified.  E.g. --snmp=timeout=1, get
> SNMP critical message after 2 seconds; --snmp-timeout=14, SNMP
> critical at 28 seconds; --snmp-timeout=15 or higher, get UNKNOWN:
> PLUGIN TIMEOUT message at 30 seconds.  (I used a host without snmpd
> running for the timeout tests.)  I can't see anything obviously wrong
> with your code, but it behaves this way both on both SLES 11 SP1 (Perl
> 5.10, net-snmp 5.4.2.1, Net::SNMP 6.0.1) and OS X 10.8 (Perl 5.12.4,
> net-snmp 5.6, Net::SNMP 6.1 [from CPAN]).

Hmm.. kind of confusing. It is due to the fact that Net::SNMP does one
retry (with the same timeout) before it bails out. This is adjustable
with the '-retries' parameter to the SNMP object. The default is 1. If I
set it to 0, the plugin times out in the SNMP object at the specified
time as you would expect. Thanks for pointing this out, I should make a
note of it in the manual page.

> You probably also want to add this option to the help/usage message.

I won't make the help output, as that only covers the most popular
options, but I'll add it to the manual page.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: timeout vs. SNMP timeout

2012-12-10 Thread Trond Hasle Amundsen
Trond Hasle Amundsen  writes:

> A new option to specify the SNMP object timeout would be easy to add,
> and is in my opinion a cleaner approach than just passing the plugin
> timeout.

Such an option is now implemented in the Git version:

  
http://git.uio.no/git/?p=check_openmanage.git;a=commit;h=32564b44c2631eeac03a920f0c180fb12e4b29c8

Please try this version (named 3.7.8-beta2) and let me know if it works
around your problem. Usage:

  check_openmange --snmp-timeout 

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: timeout vs. SNMP timeout

2012-12-07 Thread Trond Hasle Amundsen
Andrew Daugherity  writes:

> I'm troubleshooting an issue where one server is occasionally not responding 
> (I think it's a firewall or snmpd issue, not this plugin), and I noticed that 
> changing the timeout option to check_openmanage did not affect how long it 
> took before receiving the
>   SNMP CRITICAL: No response from remote host A.B.C.D
>
> message.  Looking at the code I see the timeout option is _not_ passed to the 
> Net::SNMP session object, so the SNMP connection timeout uses the default 
> value (5 seconds according to the Net::SNMP man page, but 10 seconds in my 
> testing).
>
> If I pass the timeout option to the Net::SNMP->session object like so:
> 
> diff --git a/check_openmanage b/check_openmanage
> index b6abec5..3558ed4 100755
> --- a/check_openmanage
> +++ b/check_openmanage
> @@ -860,6 +860,7 @@ sub snmp_initialize {
>  '-port' => $opt{port},
>  '-hostname' => $opt{hostname},
>  '-version'  => $opt{protocol},
> +'-timeout'  => $opt{timeout},
> );
>  
>  # Setting the domain (IP version and transport protocol)
> 
> Then it does obey the timeout option and I instead get the
>   PLUGIN TIMEOUT: check_openmanage timed out after 30 seconds
>
> message.  This might be by design though, to have a shorter SNMP timeout and 
> different error messages, but it was perplexing to me why the timeout option 
> was seemingly not working.  Perhaps a different option for the SNMP timeout, 
> or a documentation clarification, is a better way?

Hello Andrew,

Your analysis of this problem is correct, you're hitting the Net::SNMP
timeout which is default 5 seconds. There are two reasons why the
--timeout parameter isn't passed to the SNMP object:

  1. I never saw any reason to :) This is the first time I've heard of
 problems relating to it.

  2. The SNMP object timeout has limitations, it can only be between 1
 and 60 seconds. I don't know how Net::SNMP reacts if the specified
 value is outside of this range.

The documentation is lacking on this, as you pointed out, and I'll fix
that. A new option to specify the SNMP object timeout would be easy to
add, and is in my opinion a cleaner approach than just passing the
plugin timeout.

PS. I'm going away for the weekend and I'm leaving in a few minutes, so
I'll get back to you on this early next week.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage

2012-10-12 Thread Trond Hasle Amundsen
"Jens Hyllegaard (Soft Design A/S)" 
writes:

> I am using version 3.7.6 of check_openmanage.
>
> I have disabled notifications for battery charge events in the call to
> check_openmanage but I still get notifications from Nagios.
>
>  
>
> This is command line I use:
>
> $USER1$/check_openmanage -s -p -H $HOSTADDRESS$ -b ps=all -b bat_charge
>
>  
>
> This is the current output from check_openmanage for one the servers.
>
> WARNING: Cache Battery 0 in controller 0 is Charging (Ready) [probably
> harmless]

Hello Jens,

There is a slight typo in your command definition. Replace with:

  $USER1$/check_openmanage -s -p -H $HOSTADDRESS$ -b ps=all -b bat_charge=all

..and you should be fine :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: fix build on SUSE (docbook pkg name)

2012-08-06 Thread Trond Hasle Amundsen
Andrew Daugherity  writes:

> Simple fix -- the package is named 'docbook-xsl-stylesheets' instead
> of 'docbook-style-xsl'.  I added a variable for this to the global "if
> suse" section.

Thanks Andrew, applied and pushed to master.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Dell Openmanage

2012-07-01 Thread Trond Hasle Amundsen
"Sven Dohmen"  writes:

> Since several months we are using the Dell Openmanage plugin from http://
> folk.uio.no/trondham/software/check_openmanage.html. This has been working 
> fine
> untill the last couple weeks.
>
> For some servers we are getting the following results back:
>
> W: Controller 0 [PERC 6/i Integrated]: Firmware '6.2.0-0013' is out of date
> -- SYSTEM: PowerEdge R710, SN:
> INTERNAL ERROR: Use of uninitialized value within %fw_type in string eq at
> (eval 1) line 4976.
> INTERNAL ERROR: Use of uninitialized value within %fw_type in pattern match 
> (m/
> /) at (eval 1) line 4980. 
>
> I noticed this only happens when 1 of the drivers is out of date. Is there a
> solution for without directly updating the firmware (which is already planned
> over several weeks).

In case anyone else has this issue.. Sven and I worked on this off-list,
and we identified this to be an error related to using the '-o' option
over SNMP, on servers equipped with iDRAC6 or iDRAC7 management
cards. The plugin check_openmanage has been fixed and a new release
(versjon 3.7.6) is available:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

Notice for For RHEL and Fedora users: The new release has been submitted
as an update for Fedora and Fedora EPEL. It is currently in testing, and
can be updated with:

  yum --enablerepo=\*testing update nagios-plugins-openmanage

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Warning alert isn't working

2012-04-13 Thread Trond Hasle Amundsen
Leonardo Bacha Abrantes  writes:

> Hi everybody!
>
> I'm using check_openmanage plugin in nagios to monitoring the temperature of 
> my
> dell servers.
> It's working, however, the warning and critical alerts that I configure are 
> not
> working.
>
> [root@monitor:/etc/openmanage]# /usr/lib/nagios/plugins/check_openmanage -w 25
> -c 30 -H 10.11.12.1 -C Test--only temp
> TEMPERATURES OK - 1 temperature probes checked:Temperature Probe 0 [System
> Board Ambient Temp] reads 30 C (min=8/3, max=42/47)
>
> The temperature is 30 and the check should appear WARNING because I used -w 
> 25.

Hello Leonardo,

The syntax you're using with the '-w' and '-c' options is wrong. From
the manual page:

   -w, --warning STRING or FILE
   Override the machine-default temperature warning
   thresholds. Syntax is "id1=max[/min],id2=max[/min],...". The
   following example sets warning limits to max 50C for probe 0,
   and max 45C and min 10C for probe 1:

   check_openmanage -w 0=50,1=45/10

   The minimum limit can be omitted, if desired. Most often, you
   are only interested in setting the maximum thresholds.

   This parameter can be either a string with the limits, or a
   file containing the limits string. The option can be
   specified multiple times.

   NOTE: This option should only be used to narrow the field of
   OK temperatures wrt. the OMSA defaults. To expand the field
   of OK temperatures, increase the OMSA thresholds. See the
   plugin web page for more information.

   -c, --critical STRING or FILE
   Override the machine-default temperature critical
   thresholds. Syntax and behaviour is the same as for warning
   thresholds described above.

The reason that you need to specify the ID of the temperature probes is
that there may be more than one, each with its own thresholds. In your
case there is only one probe and its ID is 0, so replace your command
above with:

  check_openmanage -w 0=25 -c 0=30 -H 10.11.12.1 -C Test --only temp

That should do the trick.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Physical Disk ... Undefined value 4096

2012-02-01 Thread Trond Hasle Amundsen
Helmut Wollmersdorfer  writes:

> Physical Disk 0:0:0 [Dell WDC WD1003FBYX-18Y7B0, 1.0TB] on ctrl 0  
> needs attention: Undefined value 4096

Hello Helmut,

The "state" value for physical disks via SNMP is an integer, which is
translated by the plugin. There are a few defined values, and 4096 is
not one of them.

> On the console of the server:
>
> # /opt/dell/srvadmin/bin/omreport storage pdisk controller=0 vdisk=0
> List of Physical Disks belonging to VD10A
>
> Controller PERC H700 Integrated (Slot 4)
>
> Span 0
> ID: 0:0:0
> Status: Unknown
> Name  : Physical Disk 0:0:0
> State : Unknown
> Power Status  : Spun Up
> Bus Protocol  : SATA
> Media : HDD
> Revision  : 01.01V02
> Failure Predicted : No
> Certified : Yes
> Encryption Capable: No
> Encrypted : Not Applicable
> Progress  : Not Applicable
> Mirror Set ID : 0
> Capacity  : 931.00 GB (999653638144 bytes)
> Used RAID Disk Space  : 931.00 GB (999653638144 bytes)
> Available RAID Disk Space : 0.00 GB (0 bytes)
> Hot Spare : No
> Vendor ID : DELL
> Product ID: WDC WD1003FBYX-18Y7B0
> Serial No.: WD-WCAW3145836558365
> Part Number   : TH0V8FCR1255213BC4RGA00
> Negotiated Speed  : 3.00 Gbps
> Capable Speed : 3.00 Gbps
> Manufacture Day   : Not Available
> Manufacture Week  : Not Available
> Manufacture Year  : Not Available
> SAS Address   : 443322110700
>
> [same for all 4 disks of the array]
>
> Thus it seems that check_openmanage works correctly. Also the disk- 
> array seems to work correctly (no error messages in the logs).
>
> Is this some sort of wrong diagnostic from the firmware/controller?

No, this is not normal behaviour. I've seen this only on disks that were
so damaged that Openmanage failed miserably when attempting to get info
from them. Clearly this is not the case here, as you get the same error
on multiple disks and they otherwise work fine.

If you haven't already, you should try upgrading all BIOS and firmware
on the server, especially the controller firmware. You should also
upgrade Openmanage if you're not running the latest version (6.5.0).

If all else fails, I would contact Dell support and have them look at
it.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] SELinux and RHEL6.2 preventing disk checks via NRPE

2011-12-13 Thread Trond Hasle Amundsen
Dennis Kuhlmeier  writes:

> Geez, there are a lot more contexts set than I thought. I should
> probably remove duplicate entries, right?

The labels in

  /etc/selinux/targeted/contexts/files/file_contexts

is there by default and these should not be touched. The file

  /etc/selinux/targeted/contexts/files/file_contexts.local

contains local additions or adjustments. If there are entries there that
you think ought to be removed, you should remove them with:

  semanage fcontext -d ''

Don't edit the file directly :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Systems Optimization Self Assessment
Improve efficiency and utilization of IT resources. Drive out cost and 
improve service delivery. Take 5 minutes to use this Systems Optimization 
Self Assessment. http://www.accelacomm.com/jaw/sdnl/114/51450054/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage spec file fixes for SUSE

2011-12-12 Thread Trond Hasle Amundsen
"Daugherity, Andrew W"  writes:

> First of all, thanks for making this plugin.  It works well and is
> very handy.  As requested in the documentation, I am sending this to
> the nagios-users list rather than directly to the author.

Hello Andrew,

Excellent :) Usually a public forum is better, where everybody can
participate and share their insight.

> With some minor modifications, the package builds properly on SUSE.
> There are two main Nagios packaging differences from RedHat:
>
> 1) All Nagios plugins are installed to /usr/lib/nagios/plugins, even
> on 64-bit (there is no /usr/lib64/nagios directory).  This may not
> make the most sense, but it is what is, and being consistent with
> other Nagios packages is good.
>
> 2) Non-binary plugin RPMs (e.g. Perl scripts only) use noarch, while
> binary plugins use the corresponding arch.  For examples of both,
> browse the build service repo at
> http://download.opensuse.org/repositories/server:/monitoring/SLE_11.1/
> Being a Perl script, check_openmanage falls under the former.
>
> This is easily solved with an %if block to make a universal RPM spec:
>  BEGIN PATCH 
> --- nagios-plugins-openmanage.spec.orig   2011-10-05 10:00:18.0 
> -0500
> +++ nagios-plugins-openmanage.spec2011-12-01 15:02:10.0 -0600
> @@ -5,6 +5,16 @@
> # No binaries here, do not build a debuginfo package
> %global debug_package %{nil}
>
> +# SUSE installs Nagios plugins under /usr/lib, even on 64-bit
> +# It also uses noarch for non-binary Nagios plugins
> +%if %{defined suse_version}
> +%global nagiospluginsdir /usr/lib/nagios/plugins
> +BuildArch:   noarch
> +%else
> +%global nagiospluginsdir %{_libdir}/nagios/plugins
> +%endif
> +
> +
> Name:  nagios-plugins-openmanage
> Version:   3.7.3
> Release:   1%{?dist}
>  END PATCH 
>
> I also tested building on CentOS 5 to make sure nothing broke there,
> and indeed, nothing changed there.

Thanks for the patch, applied. However, there are some changes to the
spec file lately. Among them is an added Requires to the nagios-plugins
package, which owns the /usr/lib(64)?/nagios/plugins directory.
Hopefully SUSE does the same in this respect. The updated spec file is
available here:

  http://folk.uio.no/trondham/software/tmp/nagios-plugins-openmanage.spec

PS. check_openmanage has been added to Fedora and EPEL, but there are
some SELinux issues. Until these are resolved I'll hold off pushing it
to stable, but it is available in testing.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] SELinux and RHEL6.2 preventing disk checks via NRPE

2011-12-09 Thread Trond Hasle Amundsen
Dennis Kuhlmeier  writes:

> Hello,
>
> after upgrading to RHEL6.2 I have problems checking some
> filesystems. Always the same three FS on all hosts, others work fine.
>
> /boot
> /home
> /var/log/audit
>
> $ ./check_nrpe -H backup -c check_fs_boot
> DISK CRITICAL - /boot is not accessible: Permission denied
>
> Now I disable SELinux and it works!
> $ ./check_nrpe -H backup -c check_fs_boot
> DISK OK - free space: /boot 36 MB (39% inode=99%);| /boot=55MB;96;;0;96
>
> Although not a single line is logged on the monitored host, neither
> in messages nor in audit.log
>
> I already had a local policy created for the nrpe daemon when RHEL6
> was introduced, as somehow many checks failed, although the user
> nrpe was running in was allowed to perform all checks, the nrpe
> daemon itself couldn't. I'll attach the policy, although at one
> point I gave up and just set the entire process to permissive mode.
> (note that I tried to extend rights on boot filesystem in this
> policy already, although it would seem to be unnecessary)
>
> Anybody experiencing something alike or any suggestions about how to
> handle nrpe and RHEL6(.2) in a better way than I am?

RHEL6 has the following labels for use with Nagios plugins:

  # grep nagios /etc/selinux/targeted/contexts/files/file_contexts | grep 
plugin_exec | cut -d: -f3 | sort -u
  nagios_admin_plugin_exec_t
  nagios_checkdisk_plugin_exec_t
  nagios_mail_plugin_exec_t
  nagios_services_plugin_exec_t
  nagios_system_plugin_exec_t
  nagios_unconfined_plugin_exec_t

Try setting the confined types first, e.g.:

  chcon -t nagios_checkdisk_plugin_exec_t /path/to/check_fs_boot

If none of them works properly, you have nagios_unconfined_plugin_exec_t
as a last resort.

When you find one that works, make it permanent with:

  semanage fcontext -a -t  '/path/to/check_fs_boot'

You may also have to set proper labels on the path leading up to the
actual plugin.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo


--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage plugin: " Couldn't run command ..."

2011-11-17 Thread Trond Hasle Amundsen
Corcoran Smith  writes:

> First message, so please excuse any failures in format, etc!
>
> Got two issues with two boxes (out of 160!) using check_openmanage:
>
> 1) Couldn't run command 'c:\pro... ' etc
> 2) U nrecognized character xA8: marked by <-- HERE after <-- HERE near column 
> 1 at /loader/HASH(0xa7c42c)/UNIVERSAL.pm line 1.
>
> both are using the windows exe

Hi Corcoran,

I'll need more data to debug the first issue, e.g. the full error
message from the plugin. Unless they appear on the same server(?), in
which case issue 1 is probably caused by issue 2.

Regarding issue 2, I've seen this once before. A disk was so damaged
that OMSA failed while getting info from it, and gave an error message
like above: "unrecognized character...". This output is not something
that the plugin doesn't expect and couldn't possibly prepare for, so it
throws an error.

You need to identify the failed component, it probably needs to be
replaced. Try running 'omreport' commands to find it. Start with
'omreport storage pdisk controller=0'.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage on CentOS 5.6 Hosts

2011-11-05 Thread Trond Hasle Amundsen
the entrox  writes:

> i've been using the check_openmanage script to monitor about two dozens of 
> dell
> servers without a hitch (all Windows based) and we just set up about 15 or so
> new servers but this time running CentOS, i of course installed the OMSA via
> Dell's repository and also enabled SNMP but i cant seem to get the command to
> work on those hosts.
>
> i am trying to run the debug command to look at the entire output like this:
>
> [root@MONITOR02 plugins]# ./check_openmanage -H HOSTIP -C COMMUNITY -d
> ERROR: (SNMP) OpenManage is not installed or is not working correctly

This error means that the SNMP service on the monitored host is working
and we get a reply, but the OIDs for OMSA are not present.

> i of course checked where the omreport binary was at and its where the script
> is looking for it:
>
> [root@mvarutestvmbase01 ~]# find / -name omreport
> /opt/dell/srvadmin/sbin/omreport
> /opt/dell/srvadmin/bin/omreport
> [root@mvarutestvmbase01 ~]#

When using SNMP, the plugin doesn't utilize the omreport binary in any
way. It doesn't care where it is installed. BTW, the above location is
correct and is the default.

> just to double check i went ahead and looked if the OMSA was working, i went
> via web and the console shows up no problem at all, if i authenticate it shows
> all the information that it should be showing, i also restarted all the
> services on the OMSA just to see if something was up but nothing, it still
> claims its not working:
>
> http://pics.entrox.me/983ygh426g.png

This is interesting. The SNMP service wasn't started. You should see
something like this:

  Starting dsm_sa_snmpd: [  OK  ]

The dsm_sa_snmpd service is started by /etc/init.d/dataeng. This script
is also responsible for starting other components such as
dsm_sa_datamgrd, and that seems to work fine.

You should also see dsm_sa_snmpd in the process list if it's running:

  # ps axww | grep dsm_sa_snmpd 
   4967 ?Ssl0:00 /opt/dell/srvadmin/sbin/dsm_sa_snmpd

>From what I can gather from the dataeng init script, it won't start
dsm_sa_snmpd if this file exists:

  /opt/dell/srvadmin/var/lib/srvadmin-deng/dcsnmp.off

If it exists on your system, try removing it and restart OMSA.

Also verify that your /etc/snmp/snmpd.conf contains the following at the
very end:

  # Allow Systems Management Data Engine SNMP to connect to snmpd using SMUX
  smuxpeer .1.3.6.1.4.1.674.10892.1

This should have been added by OMSA at install time.

> i also read on the man page of the script 
> (http://folk.uio.no/trondham/software
> /check_openmanage.html) that i could use the --omreport option but no dice 
> with
> that, even trying the bin and sbin omreport binary file i got the exact same
> message:

This option allows you to specify the location of the omreport
command. It has no effect when using SNMP, and is only really usable on
Windows systems, where OMSA can be installed on drives other than C:.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: OOPS! Something is wrong...

2011-09-30 Thread Trond Hasle Amundsen
Lois Garcia  writes:

> This is the output from "omreport chassis pwrsupplies -fmt ssv":
>
> C:\Users\Administrator>omreport chassis pwrsupplies -fmt ssv
> Power Supplies Information
>
> Power Supply Redundancy
> Redundancy Status;Lost
>
> Individual Power Supply Elements
>
> Index;Status;Location;Type;Rated Input Wattage;Maximum Output Wattage;Online
> Sta
> tus;Power Monitoring Capable
> 0;Ok;PS 1 Status;AC;[No Value];[No Value];Presence Detected;Yes
> 1;Ok;PS 2 Status;AC;1080 W;870 W;Presence Detected;Yes

Thanks. This shows that the plugin's behaviour was correct in my
opinion. OMSA states that both PSUs are OK, which is what the plugin
reports. There is a bug somewhere, but it is probably in OMSA. My guess
is that there is a rare and unknown error condition in PSU1, which OMSA
doesn't handle correctly.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: OOPS! Something is wrong...

2011-09-28 Thread Trond Hasle Amundsen
Lois Garcia  writes:

> Thank you, Trond! It looks like a power supply problem. I will take the issue
> to Dell:
>
> C:\Users\Administrator>omreport system
> Health
>
> SEVERITY : COMPONENT
> Critical : Main System Chassis
>
>
> C:\Users\Administrator>omreport chassis
> Health
>
> Main System Chassis
>
> SEVERITY : COMPONENT
> Ok   : Fans
> Ok   : Intrusion
> Ok   : Memory
> Critical : Power Supplies
> Ok   : Power Management
> Ok   : Processors
> Ok   : Temperatures
> Ok   : Voltages
> Ok   : Hardware Log
> Ok   : Batteries

Hmm... there is obviously something amiss with the power supplies, but
the plugin didn't catch it. I'd like to know why. Can you provide the
output from:

  omreport chassis pwrsupplies -fmt ssv

This is the command that the plugin runs to get the status of the power
supplies.

> Thank you also for putting such a great plugin into the
> community. Without it, monitoring the few Windows machines in our all
> Linux environment would have been a chore I don't care to contemplate.

Thank you, glad you like it :)

> I don't see a donation link on your website at http://folk.uio.no/trondham/
> software/check_openmanage.html - ?

No, there is no donation link, the thought never crossed my mind. I have
benefitted enormously (personally and professionally) from free and open
source software for many years. This is just my way of giving back.
Besides, I've found that creating and maintaining open source software
is by itself rewarding, in many different ways.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage: OOPS! Something is wrong...

2011-09-27 Thread Trond Hasle Amundsen
lois garcia  writes:

> I have check_openmanage running successfully on 13 out of 16 Dell R710s. 
> I am really puzzled at what is going wrong, as it seems different on each
> machine. I have tried different versions of check_openmanage and 
> reinstalling the same version of Dell OMSA.
>
> The first eight servers were built from the same Ghost image, and last 
> month, one of those servers started showing the check_openmanage error:
>
> UNKNOWN 09-13-2011 17:04:23 7d 1h 7m 54s 4/4 
> UNKNOWN: Storage Error! No
> controllers found
> UNKNOWN: Problem running 'omreport chassis memory': 
> Error: Memory object not found
> UNKNOWN: Problem running 'omreport chassis fans': 
> Error! No fan probes found on
> this system.
> UNKNOWN: Problem running 'omreport chassis temps': 
> Error! No temperature probes
> found on this system.
> UNKNOWN: Problem running 'omreport chassis volts': 
> Error! No voltage probes
> found on this system.
>
> I reinstalled the Dell software, fixing the UNKNOWN error, and now have 
> this error:
>
> OOPS! Something is wrong with this server, but I don't know what. The 
> global system health status is CRITICAL, but every component check is 
> OK. This may be a bug in the Nagios plugin, please file a bug report. 
>
> The server is a Dell R710, running Windows Server 2008 R2 Enterprise.

Hello Lois,

(I shortened the subject)

When the plugin is used in local mode, as in your case, the plugin
checks the global health status using this command:

  # omreport system
  Health
  
  SEVERITY : COMPONENT
  Ok   : Main System Chassis
  
  For further help, type the command followed by -?

If everything is OK you'll get the output above. What do you get when
running this command on the troubled server?

Does the ESM log contain any clues? Try running 'omreport system esmlog'
and see. Try running 'omreport chassis' as well.

There are two possible causes for the oops error. Either Openmanage
isn't behaving properly, or your server has an error that the plugin
doesn't catch.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage - Feature request

2011-08-16 Thread Trond Hasle Amundsen
Russell Kackley  writes:

> I recently downloaded and started using the Nagios plugin 
> check_openmanage to provide information on our Dell PowerEdge servers to 
> Nagios. check_openmanage works very well for us, but there is one thing 
> that I would like to see added. Another possibility is that other 
> check_openmanage users could point me to a way to accomplish what I want 
> to do using the existing code.
>
> We have two PowerEdge 2950 servers, named s1 and s2. Each server has a 
> PERC 6/E card installed in them. We also have a PowerVault MD1000 
> storage unit. Both servers are connected to the MD1000. s1 is our 
> primary server so we boot it first and Dell OMSA reports that the 
> physical disks are "Online", which is what we want. s2 is our backup 
> server and we boot that second. For this server, Dell OMSA reports that 
> the physical disk status is "Non-Critical" and the state is "Foreign". 
> This is ok for us, but the problem is that check_openmanage sees the 
> "Non-Critical" status and reports a Warning for the physical disks. I 
> would like check_openmanage to ignore the "Non-Critical" status when the 
> state is "Foreign", preferably via a blacklist option, e.g., 
> pdisk_foreign. I think that this is similar to the blacklist option 
> pdisk_cert, in which check_openmanage ignores the "Non-Critical" status 
> when a disk is not certified by Dell. Note that I did investigate the 
> blacklist item pdisk and the "--check storage=0" option, but my 
> understanding of those options is that they suppress all checks of the 
> disks, which is not what I want.
>
> Do the users of check_openmanage 1) have any suggestions for how I can 
> tell check_openmanage to ignore the "Non-Critical"/"Foreign" state of 
> the disks, or 2) think that this would be a useful feature to add to 
> check_openmanage?

Hi Russel,

This would be a nice feature to add. Please try the latest development
version (3.7.1-beta2), it includes the new blacklisting keyword
'pdisk_foreign' as you suggest:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] home made php script

2011-08-14 Thread Trond Hasle Amundsen
"Erik Olsen"  writes:

> I've been trying to make my own script now for a few hours but im not
> getting it to work with nagios.
> Im most familiar with php so I used that to make the script.
>
> My setup:
> Ubuntu 11.4 server
> Nagios 3.2.3
>
> The host/command/and service are all in the same .cfg file.
>
>   define command{
>   command_name check_ups_temprature2
>   command_line $USER$/check_ups_temp.php
> }
>
> define service{
> use generic-service
> host_name   ups1
> service_description Temp ups env sensor
> check_command   eaton_ups_temp
> }
>
> Status Information  (Return code of 127 is out of bounds - plugin may be
> missing)

Hi Erik,

There is a typo on the "command_line" line. The $USER$ macro doesn't
exist. There are 32 possible user macros, named $USER1$ through
$USER32$. Try replacing $USER$ with $USER1$, or simply the actual path
leading up to the plugin.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
FREE DOWNLOAD - uberSVN with Social Coding for Subversion.
Subversion made easy with a complete admin console. Easy 
to use, easy to manage, easy to install, easy to extend. 
Get a Free download of the new open ALM Subversion platform now.
http://p.sf.net/sfu/wandisco-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] omreport and check_openmanage

2011-07-06 Thread Trond Hasle Amundsen
Emilio Bruna  writes:

> Thanks a lot for your hints Trond,
> check_openmanage is already at latest version.
>
> We will try with an OMSA update first and then (if the issue persist)
> we will update BIOS too.

If all else fails, you have the option of disabling the power management
check completely, by using '--check amperage=0':

  check_openmanage --check amperage=0

By using this option you're telling the plugin that it shouldn't even
attempt to run 'omreport chassis pwrmonitoring'.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] omreport and check_openmanage

2011-07-04 Thread Trond Hasle Amundsen
Emilio Bruna  writes:

> Omsa version is 6.2.0.1
> so: windows 2008 storage server SP2
> Hardware is Dell NX 300 Storage server (a derivate of R410 or R310 i think)

This combination should be ok. I don't know the NX300, but if it's based
on the R310 or R410 it shouldn't be a problem. There was a bug in
check_openmanage related to power monitoring on the R410, but this was
fixed in version 3.6.5 of the plugin. Are you using the latest version
of check_openmanage, which is 3.6.8?

Also, would it be possible for you to upgrade OMSA to the latest
version, 6.5.0?

This really is an OMSA issue. If the power supplies don't support power
monitoring, omreport should just that say that and check_openmanage is
happy. But in your case, OMSA is responding with an error.

One last tip. In some cases I've seen that certain capabilities in OMSA
depends on BIOS and/or firmware versions. You should verify that the
BIOS and firmware is relatively up-to-date.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage error on W2k8r2 Dell R900

2011-07-03 Thread Trond Hasle Amundsen
Jay Wahl  writes:

> Love check_openmanage plugin for Nagios! It has been a great help for
> monitoring our Dell hardware. I recently built 3 Dell 900s with W2K8r2 with
> check_openmanage (v 3.6.8) and Dell OMSA (v 6.5.0).

Hi Jay,

Are you completely sure that you're using version 3.6.8? My reason for
asking is that the errors you get don't make sense (details below).

> I am getting the following errors:
> C:\Program Files\NSClient++\scripts>check_openmanage
> Problem running 'omreport chassis memory': Error Correction;Multibit ECC

This was fixed a while back (version 3.6.3 IIRC).

The "Error Correction" field appeared in OMSA 6.4.0 and check_openmanage
triggers on strings containing "Error". The particular string above
obviously does not indicate an actual error and was put in the whitelist
for errors shortly after OMSA 6.4.0 was released.

> INTERNAL ERROR: Use of uninitialized value in concatenation (.) or string at
> script/check_openmanage line 1650.
> INTERNAL ERROR: Use of uninitialized value in concatenation (.) or string at
> script/check_openmanage line 1650.

These two don't make any sense, since line 1650 only contains a comment.
They are also probably not related to the memory check.

Please verify the version of check_openmanage. The plugin will output
its version number with either of these options:

  check_openmanage -V
  check_openmanage -d

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] omreport and check_openmanage

2011-07-01 Thread Trond Hasle Amundsen
Emilio Bruna  writes:

> Hello all,
> i'm monitoring several Dell windows servers with nagios and NSClient++
> and OMSA + check_openmanage.  On one of these, i'm getting a problem
> monitoring the redundant power supplies.
>
> Running the command below LOCALLY on the machine being monitored i got
> the right data from omreport.exe:
>
> c:\Program Files (x86)\Dell\SysMgt\oma\bin>omreport.exe chassis pwrsupplies
> Power Supplies Information
>
> ---
> Main System Chassis Power Supplies : Ok
> ---
>
> Power Supply Redundancy : Ok
> Attribute : Redundancy Status
> Value : Full
> Individual Power Supply Elements
> Index    : 0
> Status   : Ok
> Location : PS 1 Status
> Type : AC
> Rated Input Wattage  : 680 W
> Maximum Output Wattage   : 500 W
> Online Status    : Presence Detected
> Power Monitoring Capable : Yes
>
> Index    : 1
> Status   : Ok
> Location : PS 2 Status
> Type : AC
> Rated Input Wattage  : 680 W
> Maximum Output Wattage   : 500 W
> Online Status    : Presence Detected
> Power Monitoring Capable : Yes
>
> running the below command (the ones needed to check_openmanage):
>
> c:\Program Files 
> (x86)\Dell\SysMgt\oma\bin>c:\Users\administrator.CMVC\Desktop\
> check_openmanage.exe --omreport "c:\Program Files (x86)\Dell\SysMg
> mreport.exe"
> Problem running 'omreport chassis pwrmonitoring': Error: Current probes not
> found
>
> i've noticed that the switches coming from check_openmanage are
> slightly different from the ones passed from omreport.exe ("omreport
> chassis pwrmonitoring" instead of "omreport chassis pwrsupplies")
>
> so it seems that check_openmanage has the wrong switches regard to the
> powermonitoring check status; or maybe the omsa version i'm using is
> not at the correct version to work in the right way with
> check_openmanage.

Hi Emilio,

Don't confuse the two arguments 'pwrsupplies' and 'pwrmonitoring'. They
do different things, and check_openmanage uses both of them. It runs
'omreport chassis pwrsupplies' to get the status of the power supplies,
and it runs 'omreport chassis pwrmonitoring' to get the status and value
of the amperage probes. The latter includes the overall power
consumption of the server.

In your case, it's the 'pwrmonitoring' command that fails. This is a
known problem with some older versions of OMSA. Which version of OMSA
are you running, and on what kind of PowerEdge server?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Check_Openmanage configuration question

2011-05-10 Thread Trond Hasle Amundsen
Daniel Ceola  writes:

> Hello all!

Hi Daniel,

> I have a question regarding the initial configuration of
> check_openmanage.  I downloaded the version of the script dated Feb 9
> (I don?t see a version number in the script) and am attempting to use
> the script through SNMP.

Tip: Run the plugin with the '-V' or '--version' switch to view the
version number.

> I?m attempting to begin using check_openmanage with our Dell servers.
> I have installed the Dell OMSA software on one server and it seems to
> be working just fine.  I configured my command definition in a simple
> fashion, according to the installation guide:
>
> #  Dell Check openmanage
>
> define command{
> command_namecheck_openmanage
> command_line$USER1$/check_openmanage -H $HOSTADDRESS$
> }
>
> I also configured my service definition in a simple fashion, according
> to the installation guide:
>
> define service{
> use generic-service
> host_name   Server_Name
> service_description Dell OMSA
> check_command   check_openmanage
> }

This looks correct to me.

> However ? my Nagios console is reporting the status as (null).  Also,
> when I attempt to run the script from the command line (note the file
> is saved as check_openmanage with no file extension, I also tried
> check_openmanage.pl and receive the same results), I receive a few
> errors
>
> nagios@UbuntuTest:/usr/local/nagios/libexec$ ./check_openmanage 192.168.1.5
> ./check_openmanage: line 27: require: command not found
> ./check_openmanage: line 28: use: command not found
> ./check_openmanage: line 29: use: command not found
> ./check_openmanage: line 30: syntax error near unexpected token `('
> ./check_openmanage: line 30: `use POSIX qw(isatty ceil);'

Weird. Your system seem to be running the plugin through a shell. The
output above is exactly what you'll get if you run

  sh ./check_openmanage

To specify perl as interpreter, run:

  perl ./check_openmanage

However, this should not be needed. The system should identify it as a
perl script and use perl to execute it by default. Have you edited the
plugin in some way? Check that the md5sum is correct:

  $ md5sum check_openmanage
  5281718fe9e5c4b9570fe76f0fb424ec  check_openmanage

The above sum is correct for version 3.6.6. You should verify that you
get the same (if running 3.6.6). The latest version and its md5sum are
available here:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

PS. In your example above you have forgotten the '-H' switch.

PPS. The file extension (or the name itself) is unimportant.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] check_openmanage PNP template (Was: check_openmanage errors)

2011-04-28 Thread Trond Hasle Amundsen
"Randal, Phil"  writes:

> Is the beta of check_openmanage.php available for testing?

Sure, I put it here:

  http://folk.uio.no/trondham/software/beta/

Highlights of the template are:

  - works with the plugin's new perfdata API
  - removed unnecessary dependence on PHP >= 5.2 (good for rhel/centos 5
users)
  - calculate power usage for the selected time period, in Watt hours
and BTU

> I'm currently using a slightly modified version of the one in the latest PNP 
> release.
>
> Two cosmetic issues came to mind:
>
> 1: Temperature is measured in Celsius, not Celcius

Yep, I know. That typo was the first thing I fixed :)

> 2: Formatting when reporting multiple sensors in one graph is irksome
> - the values don't align in a nice column (e.g. temperatures).  I
> 'solve' this by a judicious use of substr() and str_pad() to normalise
> the length of reported sensor names.

Hm... this could be tricky to do in a consistent and general manner (at
least the substr() part). The sensor names are as reported by
OMSA. Perhaps this could be accomplished with some RRD magic instead?

Tips and hints are welcome, since I'm neither a PHP expert nor an RRD
ninja :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage errors

2011-04-28 Thread Trond Hasle Amundsen
Steve Glasser  writes:

>> That combination should work just fine. Please try either of the beta
>> versions, as I suggested in my previous email. The issue you're having
>> may very well be fixed in the betas.
>
> Tried check_openmanage-3.7.0-beta2.0-beta2, problem solved.

Excellent, thanks for testing and reporting back. I've just released
versjon 3.6.6, which contains the same bugfixes as the 3.7 beta, but not
the (unfinished) new features :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage errors

2011-04-27 Thread Trond Hasle Amundsen
Steve Glasser  writes:

> D'oh.  We are using check_openmanage with NRPE.  The host o/s is CentOS 
> release 5.5.  Perl is perl-5.8.8 (from rpm).

That combination should work just fine. Please try either of the beta
versions, as I suggested in my previous email. The issue you're having
may very well be fixed in the betas.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check-openmanage errors after upgrade of openmanage

2011-04-27 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> I ran check_openmanage.exe --only storage locally and it worked fine.
>
> I then changed the NSC.ini to have:
>
> command[check_openmanage]=check_openmanage.exe --only storage
>
> and restarted the NSCLient++ (x64) service in test mode.
>
> the results:
>
> d NSClient++.cpp(1106) Injecting: check_openmanage:
> d NSClient++.cpp(1142) Injected Result: WARNING 'Problem running 
> 'omreport chass is fans': Error! No fan probes found on this 
> system.Problem running 'omreport chassis temps': Error! No 
> temperature probes found on this system.Proble m running 'omreport 
> chassis volts': Error! No voltage probes found on this system.'

Ok, this actually clarifies things. Clearly, NSClient++ ignores
everything after 'check_openmanage.exe' in your NSC.ini. There is no way
that check_openmanage would complain about fans etc. when the option
'--only storage' is specified. Since it works from command line we can
safely assume that NSClient++ is the problem. This explains your issues
with the timeout option as well.

> I've looked on your site for the dev versions and am happy to try them 
> but don't see a zip with the .exe.  Is there an .exe available for the 
> dev?  also, which dev version would you prefer I try, 3.6 or 3.7?

I could make a PE32 executable for the dev versions, but in your case it
won't help, so there is really no point. Your problem is that NSClient++
ignores the plugin options.

Since I don't use NSClient++ I can't offer any insight into how it
should be configured, but my first attempt at a fix would be to put the
entire command in quotes in NSC.ini.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check-openmanage errors after upgrade of openmanage

2011-04-27 Thread Trond Hasle Amundsen
Trond Hasle Amundsen  writes:

> Are you using check_openmanage with NRPE or similar in local mode, or
> checking via SNMP?

I have an idea of what the problem might be. Can you try either of the
development versions of check_openmanage available here:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-27 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> ok, talked to dell, there is no hardware on the T105 that will allow
> monitoring of the fan, voltage, etc.. basically the only thing you can
> monitor is the raid array which is fine as that's all I really want to
> check with nagios.

Ok. I don't know the 100 series, but from what I understand they are
entry-level servers with limited capabilities and a low price tag. The
plugin will barf at servers that don't have the basic monitoring probes,
unless they are absent for obvious reasons (e.g. blades don't have
fans). I still think this is a good idea, as I've seen plenty of
instances where OMSA malfunctions in such a way that it will say a probe
doesn't exist when it actually does.

I'm reluctant to change that policy, so users of the 100 series will
have to exclude certain checks in the plugin. It is not ideal, but I
believe the problem to be limited since most would go for servers with
better monitoring capabilities (i.e. 200 series and beyond).

> Still have that pesky timeout after 30 seconds error though.  tried
> with --timeout 60 and with -t 60 and nothing seems to change the
> behavior.

Still weird. Did you try running the plugin manually with the timeout
option? Try 'check_openmanage.exe -t 60 [other options]'

Perhaps OMSA on the T105 hangs on some probe that doesn't exist. If
you're only interested in monitoring storage, you could try:

  check_openmanage.exe --only storage

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check-openmanage errors after upgrade of openmanage

2011-04-26 Thread Trond Hasle Amundsen
Steve Glasser  writes:

> Since upgrading dell openmanage from v 6.3 to 6.5 we have errors using 
> the check-openmanage plugin.  The errors are:
>
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in length at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in concatenation (.) or 
> string at /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in length at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in concatenation (.) or 
> string at /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in length at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in concatenation (.) or 
> string at /usr/lib64/nagios/plugins/check_openmanage line 4599.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
> INTERNAL ERROR: Use of uninitialized value in hash element at 
> /usr/lib64/nagios/plugins/check_openmanage line 4601.
>
> The plugin reports "status unknown".
>
> Openmanage is version check-openmanage-3.6.5-1.el5 installed from rpm. 
> The host is an dell 2950.  Please let me know if I can provide any 
> additional information.

Hi Steve,

Are you using check_openmanage with NRPE or similar in local mode, or
checking via SNMP?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> the server is a PowerEdge T105.  It IS running slow but I'll be damned 
> if I can figure out why, I'm beggining to suspect bad ram as the 
> performance meter reports minimal load.

One thing to check is the power management setting in the BIOS. We set
up a few blade servers recently that had set this to "active power
controller", and this caused the server to be extremely
sluggish. Setting this to "OS Control" or "Maximum Performance" solved
the issue. Try:

  # omreport chassis pwrmanagement config=profile
  Power Profiles
  
  Maximum Performance : Not Selected
  Active Power Controller : Not Selected
  OS Control  : Selected
  Custom  : Not Selected

You can set the profile to max performance with:

  omconfig chassis pwrmanagement config=profile profile=maxperformance

Just a tip, but worth checking.

> here is the command line in the NSC.ini
>
> [modules]
> command[check_openmanage]=check_openmanage.exe -t 60 --check 
> fans=0,volt=0
>
> on the nagios server:
>
> /usr/lib/nagios/plugins/check_nrpe -H $hostname$ -p 5666 -c 
> check_openmanage -t 60
>
> I'm pretty sure it's not the Check_nrpe command line as this works fine 
> on several other servers.  it's def something on the client server 
> itself so this points to the NSClient++ setup.

Can't see anything wrong with these definitions..

> note I have been testing by running NSClient++.exe /test so i can watch 
> the client server and it is getting the injection command and reporting 
> the timeout locally.

Good. But it's still weird that you get a timeout after 30 seconds even
when you specify a 60 sec timeout. Try running check_openmanage.exe
manually on the server with the same options and see if it then behaves
in the same way. If so there is some sort of bug in the plugin that only
affects the .exe version.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> Ok, now new and exciting changes... no matter what I do I get: WARNING 
> PLUGIN TIMEOUT: check_openmanage timed out after 30 seconds.
>
> I have -t 60 set on the check_openmanage command and also on the NRPE 
> check command line and in the NSC.ini.  nothing seems to change the 
> timout beyond 30 seconds.

I forgot to mention that since you get that particular error it's the
plugin that times out, not NRPE or NSClient++. The fact that you're
unable to change that behaviour with the '-t' or '--timeout' option is
strange, but it would usually indicate a configuration error on your
part. You'll have to post the command definition etc. for me (and others
on this list) to be able to spot the error.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> Ok, now new and exciting changes... no matter what I do I get: WARNING 
> PLUGIN TIMEOUT: check_openmanage timed out after 30 seconds.
>
> I have -t 60 set on the check_openmanage command and also on the NRPE 
> check command line and in the NSC.ini.  nothing seems to change the 
> timout beyond 30 seconds.
>
> (yes, I've restarted the nsclient++.exe on the remote server).

Hmm.. Unless the server is under very heavy load you're still having
OMSA problems. I'm guessing that some probe doesn't respond properly and
just hangs.

If the problem is load related, you should consider checking via SNMP
instead.

What model PowerEdge is this btw?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> now my problem is this...
>
> Problem running 'omreport chassis fans': Error! No fan probes found on 
> this system.Problem running 'omreport chassis temps': Error! No 
> temperature probes found on this system.Problem running 'omreport 
> chassis volts': Error! No voltage probes found on this system.
>
> on the NSC.ini i have the following line added and I restarted the 
> NSClient++ service
>
> command[check_openmanage]=check_openmanage.exe -b fan=all
>
> even tried
>
> command[check_openmanage]=check_openmanage.exe -b fan=0
>
> however it still tries to check the fan.  I suppose i have a syntax 
> error?

No, that is the correct syntax. Blacklisting won't prevent the component
class from being checked in the first place, it will only suppress any
info about blacklisted components it in the output and plugin return
value. To skip fans alltogether use the '--check' option like this:
'--check fans=0'.

However, unless this is a blade system and the plugin is unable to
identify it as such for some reason, your server HAS fan probes and
you're having an OMSA problem. The fact that you get errors for other
probes such as temperature and voltage confirms this.

You need to recheck that OMSA works, that all relevant OMSA components
are installed and running etc. It may be as simple as restarting OMSA,
but it could also be more complex (e.g. BIOS/firmware upgrade needed).
These errors are pretty generic, but the problem is that OMSA isn't
working properly on that server.

PS. See this URL about configuring Nagios to not escape HTML code in the
plugin output (to avoid the literal ''):

  http://folk.uio.no/trondham/software/check_openmanage.html#multiline-output

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> Thanks for the reply.  I just realized from your question that I'm using 
> a pre-compiled .exe version of your check_openmanage from here:
>  
> https://www.monitoringexchange.org/inventory/Check-Plugins/Hardware/check_openmanage-exe
>
> which was probably created from an older version...

Yeah I think it's pretty old. A PE32 executable for Windows is available
in the zip and tar.gz archives, and as a single file download:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

Upgrading to the latest version will probably solve your problem. Let me
know if it doesn't.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage

2011-04-26 Thread Trond Hasle Amundsen
Ashcor Technologies  writes:

> on two of my dell servers check_openmanage (via nsclient++ and nrpe) 
> return the same error:
>
> "Use of uninitialized value in concatenation (.) or string at 
> script/check_openmanage.pl line 1386."
>
> both dell systems are running the latest OpenManage version 6.5.0.

Hi Jeff,

Which version of check_openmanage is this?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Why is check_openmanage so slow on PowerEdge R510?

2011-04-14 Thread Trond Hasle Amundsen
"C. Bensend"  writes:

>Is there anything in OMSA that tells how *long* a battery has
> been charging?  I simply got so tired of the charging warnings
> that I blacklisted the bat_charge totally, but I'd still like to
> detect that type of error - where the battery never finishes
> charging.
>
>If OMSA has it, it would be great to have the option within
> check_openmanage to specify a length of time threshold for battery
> charging.  :)

Hi Benny,

Unfortunately OMSA has no info on when the charge cycle is expected to
be finished, or how long it has been in its current learn/charge state:

  # omreport storage battery controller=1
  Battery 0 on Controller PERC 6/E Adapter (Slot 1)
  
  Controller PERC 6/E Adapter (Slot 1)
  ID: 0
  Status: Non-Critical
  Name  : Battery 0
  State : Charging
  Recharge Count: Not Applicable
  Max Recharge Count: Not Applicable
  Predicted Capacity Status : Ready
  Learn State   : Requested
  Next Learn Time   : 0 hours
  Maximum Learn Delay   : 7 days 0 hours
  Learn Mode: Auto

I could make the plugin record it, but then I would violate my principle
that the plugin should be stateless... Introducing state in the plugin
complicates things.

There is another reason that you would want to know that the battery is
charging, and I suspect that this is also why Dell has OMSA report it as
a non-critical (warning) status. During (some of) the charge cycle,
write-back for vdisks (i.e. use of the cache) is disabled. This means
that the RAID performance is degraded, and depending on the nature of
your disk usage you'll want to know about this when it happens. OMSA
also lets you delay the charge cycle for up to seven days.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Why is check_openmanage so slow on PowerEdge R510?

2011-04-14 Thread Trond Hasle Amundsen
Helmut Wollmersdorfer  writes:

> Another question:
>
> I always get on all of the R510s (few days old):
>
> root@xen11:~# /usr/lib/nagios/plugins/check_openmanage
> Cache Battery 0 in controller 0 is Charging (Ready) [probably harmless]
> root@xen11:~# uptime
>  12:08:35 up 2 days,  1:22,  1 user,  load average: 0.00, 0.00, 0.00
>
> I wonder a little bit that the batteries are not full after some days powered,
> or if the information is wrong.

The plugin is simply reporting what OMSA says, so if the info is wrong
it would have to be in the hardware or OMSA level. However I don't think
that this is the case. Batteries take a long time to charge for new
servers, i.e. if the battery is brand new and hasn't been charged
before.

At one time we had a battery that didn't finish charging for a week,
called Dell and got a replacement battery. This was during a regular
charge cycle. In your case I would give it a few more days.

> Also I tried to '--blacklist bat_charge=0,0' (and other combinations), but
> blacklisting does not work.

Look in the debug output for the battery ID, which consists of the
controller number and battery number with colon as delimiter. In your
case it would be

  --blacklist bat_charge=0:0

or simply use 'all':

  --blacklist bat_charge=all

But, as we in fact did experience a case where the battery never
finished charging I would advice against this. We just ignore the
battery charge warnings unless they persist for days. It can be
annoying, but we decided that we can live with it :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Why is check_openmanage so slow on PowerEdge R510?

2011-04-13 Thread Trond Hasle Amundsen
Helmut Wollmersdorfer  writes:

> new to this architecture I installed the monitoring plugin check- 
> openmanage and was surprised about the performance:
>
> root@xen10:~# time perl /usr/lib/nagios/plugins/check_openmanage  -d |  
> head -n 3
> sh: /bin/rpm: not found
> System:   PowerEdge R510 II   OMSA version:   
>  6.5.0
> ServiceTag:   1Z7215J Plugin version: 
>  3.6.5
> BIOS/date:1.6.3 02/01/2011Checking mode:  
>  local
>
> real  0m3.426s
> user  0m2.456s
> sys   0m0.544s
>
> OS: Debian
> root@xen10:~# uname -a
> Linux xen10 2.6.32-5-xen-amd64 #1 SMP Tue Mar 8 00:01:30 UTC 2011  
> x86_64 GNU/Linux
>
> Most calls of check_openmanage (from the shell) take 3 - 4 seconds,  
> some with '--only' are faster, but not as fast as omreport:
>
> root@xen10:~# time perl /usr/lib/nagios/plugins/check_openmanage  -- 
> only fans
> FANS OK - 5 fan probes checked
>
> real  0m0.716s
>
>
> root@xen10:~# time /opt/dell/srvadmin/bin/omreport chassis fans
> Fan Probes Information
>
> Fan Redundancy
> Redundancy Status : Full
> [...]
>
> real  0m0.037s
>
> In comparison called with the option --help (does nearly nothing) the  
> execution time is as expected for loading the perl interpreter and  
> compiling the source:
>
> root@xen10:~# time perl /usr/lib/nagios/plugins/check_openmanage  -h
> [...]
> real  0m0.064s
>
> What can be the reason?

Hi Helmut,

The simple answer is that omreport commands take time. They represent
the vast majority of the plugin execution time.

The reason that 'check_openmanage --only fans' takes significantly more
time than the corresponding omreport command is that the plugin first
runs 'omreport -?' to determine if this is a blade or not. If you add
the time it takes to run 'omreport -?', the omreport fans command and
perl interpreter time you should arrive at about the time it takes
'check_openmanage --only fans' to finish.

Note that storage takes time to check, since the omreport commands for
storage are slow. This is especially true if you have a lot of storage
(e.g. an R510).

Also note that if you use the '-d' option, check_openmanage will run
'omreport about' to determine the OMSA version. This is a slow command
and adds to the overall execution time.

The plugin is much faster if used in SNMP mode, especially if you lots
of storage. Example from a 2950 with a couple of MD1000 shelves of extra
storage:

  $ time ./check_openmanage -H foo
  OK - System: 'PowerEdge 2950 III', SN: 'XXX', 16 GB ram (8 dimms), 3
  logical drives, 32 physical drives
  
  real0m1.725s
  user0m0.397s
  sys 0m0.013s
  
  foo /# time /usr/lib64/nagios/plugins/check_openmanage 
  OK - System: 'PowerEdge 2950 III', SN: 'XXX, 16 GB ram (8 dimms), 3
  logical drives, 32 physical drives
  
  real0m4.188s
  user0m2.997s
  sys 0m0.821s

As you can see the footprint is significantly smaller with SNMP, so if
this is a concern then SNMP should be your weapon of choice :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage internal error

2011-03-09 Thread Trond Hasle Amundsen
Adam Caines  writes:

> Looks like it's reporting "path health".  The 6e has both sas ports
> connected to redundant controllers in the MD1120.  It's strange on
> another server, I also have a PERC H700 connect to a MD1220 with
> redundant links and it does not output the "path health" section.

[snip]

> ID             : 0
> Status         : Ok
> Name           : Logical Connector
> State          : Ready
> Connector Type : SAS Port RAID Mode
> Termination    : Not Applicable
> SCSI Rate      : Not Applicable
>
> Path Health
> Status : Ok
> Name   : Connector 0
> State  : Available
>
> Status : Ok
> Name   : Connector 1
> State  : Available

Yes, so this is the culprit... check_openmanage did not expect this
output. It looks like the controller is connected to the enclosure in
redundant path mode, according to the OMSA documentation[1]. I really
need to see how this looks with SSV format, can you provide the output
from this command:

  omreport storage connector controller=1 -fmt ssv

In case of redundant path mode, the plugin should check the path health
and report on it, in addition to the connector health. This
functionality must be added to the plugin.

Is it possible for you to check how check_openmanage handles this when
checking via SNMP as well?

[1] 
http://support.euro.dell.com/support/edocs/software/svradmin/6.4/en/CLI/HTML/reportst.htm#wp1077100

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage internal error

2011-03-08 Thread Trond Hasle Amundsen
Adam Caines  writes:

> Looks like some strange output on the lines for controller 1?  The
> formatting is breaking there.  I checked omreport storage controller
> and didn't see anything that stood out as being strange.

[snip]

>       OK |      0:0 | Connector 0 [SAS Port RAID Mode] on controller 0 is 
> Ready
>       OK |      0:1 | Connector 1 [SAS Port RAID Mode] on controller 0 is 
> Ready
>       OK |      1:0 | Logical Connector  [SAS Port RAID Mode] on controller 1 
> is Ready
>          | 1:Status | State [Name] on controller 1 is Status
>          |     1:Ok | Available  [Unknown type] on controller 1 is Unknown 
> state
>          |     1:Ok | Available  [Unknown type] on controller 1 is Unknown 
> state

Ok, something strange going on here. This seems to be a parsing error in
the plugin, related to the connectors. As I don't have any MD1120
enclosures, I'm curious if these errors are related to the MD1120 being
different somehow.

Can you send the output from these commands:

  omreport storage connector controller=0
  omreport storage connector controller=1

and also:

  omreport storage connector controller=0 -fmt ssv
  omreport storage connector controller=1 -fmt ssv

The latter is what the plugin is using as it is easier to parse.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage internal error

2011-03-08 Thread Trond Hasle Amundsen
Adam Caines  writes:

> All of the status lines appear to be Ok.

Indeed they do. This appears to be trickier than I initially
thought. Perhaps the debug output from the plugin has some clues?

Try 'check_openmanage -d --only storage'. It will attempt to print the
status of all monitored storage components.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage internal error

2011-03-08 Thread Trond Hasle Amundsen
Adam Caines  writes:

> Having a strange problem with check_openmanage.  Use it without error on many
> other systems.  Any help would be appreciated.
>
> check_openmanage version: 3.6.5 (.exe version)
> Dell OMSA version: 6.4.0
> OS: Windows Server 2008 R2
> Hardware: Poweredge 1950 with PERC 6/i and PERC 6/e connected to MD1120
>
> 
>
> check_openmanage output:
>
> OK - System: 'PowerEdge 1950 III', SN: 'XXX', 8 GB ram (4 dimms), 2 
> logical
> drives, 28 physical drives
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value in numeric lt (<) at script/
> check_openmanage line 4634.
> INTERNAL ERROR: Use of uninitialized value $level in numeric eq (==) at 
> script/
> check_openmanage line 4637.
> INTERNAL ERROR: Use of uninitialized value $level in numeric eq (==) at 
> script/
> check_openmanage line 4637.
> INTERNAL ERROR: Use of uninitialized value $level in numeric eq (==) at
>
> 
>
> If I run check_openmanage --no-storage the errors are not present:

Hi Adam,

Interesting. This is the status of the device (as reported by omreport)
that is garbled somehow. The plugin will set the status to 'Unknown' if
the field is missing or empty, so this means that omreport is reporting
the status as something new that check_openmanage doesn't recognize.

That you're getting so many of them (and you have established that it's
a storage issue), makes me think that it is related to physical disks.

We need to see what omreport says about storage, in particular the disk
drives. Can you send the output from

  omreport storage pdisk controller=X

where 'X' is the controller number (0,1) , for each of the controllers.
If the Status field is 'Ok' for all the disks, we need to look further.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Error in performance-data-output

2011-03-07 Thread Trond Hasle Amundsen
"Lichterfeld, Dirk"  writes:

> I compare the response time of the nagios check and I see, that the DELL
> Server R710 needs over 10 seconds to answer. Another server (DELL R310)
> answer in 8 seconds (the check of this server is ok.)
>
> The response time depends on various Dell hardware.

Yes, this is expected when using the win32 binary file. It contains a
perl interpreter and is slow to start up and execute. When monitoring
windows machines, SNMP is preferable unless your security policies
prohibits this.

> What I do? I expanded the check-command of the check_openmange from
> "check_nrpe -H $HOSTADDRESS$ -c Check_Openmanage" with the parameter "-t
> 30" to extend the time for this check. 

30 seconds is the default timeout for check_openmanage. I would set the
timeout to slightly more than the check_openmanage timeout. If you do
that, you'll get a meaningful error message from check_openmanage
instead of a cryptic one from NSClient++, if check_openmanage times out
for some reason.

Anyway, the '-t 30' parameter to check_nrpe should work...

> Is there another way to set the timeout?

I'm not familiar with NSClient++, perhaps it has its own timeout?

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Error in performance-data-output

2011-03-02 Thread Trond Hasle Amundsen
"Lichterfeld, Dirk"  writes:

> Hi Trond,
>
> I´m sorry, at my company we use Outlook, so the highlighted text is 
> distinctly and visibly.
>
> I will try to specify the problem I mean.
>
> If I run NSClient++ in testmode I will get the follow output:
>
>   d NSClient++.cpp(1106) Injecting: Check_OpenManage:
>   d NSClient++.cpp(1142) Injected Result: OK 'OK - System: 'PowerEdge 
> R710 II', SN: 'XXX', 4 GB ra
>   m (2 dimms), 1 logical drives, 4 physical drives'
>   d NSClient++.cpp(1143) Injected Performance Result: 
> 'fan_0_system_board_fan_1_rpm=3600;0;0 fan_1_sys
>   tem_board_fan_2_rpm=3600;0;0 fan_2_system_board_fan_3_rpm=3600;0;0 
> fan_3_system_board_fan_4_rpm=3600
>   ;0;0 fan_4_system_board_fan_5_rpm=3600;0;0 
> pwr_mon_0_ps_1_current=0.4;0;0 pwr_mon_1_ps_2_current=0.4
>   ;0;0 pwr_mon_2_system_board_system_level=175;917;966 
> temp_0_system_board_ambient=20;42;47
>   '
>
> You can see, the injected perfomance result beginns and ends with a '. 

Yes, but I think that NSClient++ is responsible for that, putting
everything inside single quotes. As you can see it does that for the
plugin output as well.

> 1. I mean, that every description and only the description must be inside of 
> the signs ' 
>   our output:  fan_2_system_board_fan_3_rpm
>   must be:'fan_2_system_board_fan_3_rpm'
> 2. At the end is no special sign approved. 
>
> You can read this in "chapter 2.6 Performance data" at 
> http://nagiosplug.sourceforge.net/developer-guidelines.html
>
> I hope I could describe the problem well enough.

Yes, thank you, this was much clearer :) However, the quotes are not
needed according to the guidelines for performance data[1]:

  3. the single quotes for the label are optional. Required if spaces, =
 or ' are in the label

The perfdata labels don't contain any of the offending characters.

Could it be that this is a Windows issue, or perhaps NSClient++?

Any NSClient++ users here who can confirm if this is the case? I'm
thinking that perhaps the underscore character '_' is throwing off
Windows or NSClient++.

[1] http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN201

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Error in performance-data-output

2011-03-01 Thread Trond Hasle Amundsen
"Lichterfeld, Dirk"  writes:

> I want to use "check_openmanage" to monitor some Dell servers with Nagios. At
> the most of them we use as OS Windows Server 2003.
>
> I have a problem to get the performance results from the machines. With the
> command "check_openmanage -p" I get a failure in Nagios like "CHECK_NRPE:
> Socket Time Out"
>
> I checked the result of the Check an see what is happend. In the return of the
> performance-data fails some signs. I highlight the missing signs:
>
>
> OUTPUT with NSClient++ in TestMode (where the highlighted signs are not in the
> output):
>
> d NSClient++.cpp(1106) Injecting: Check_OpenManage:
> d NSClient++.cpp(1142) Injected Result: OK 'OK - System: 'PowerEdge R710 II',
> SN: 'XXX', 4 GB ra
> m (2 dimms), 1 logical drives, 4 physical drives'
> d NSClient++.cpp(1143) Injected Performance Result:
> 'fan_0_system_board_fan_1_rpm'=3600;0;0 'fan_1_sys
> tem_board_fan_2_rpm'=3600;0;0 'fan_2_system_board_fan_3_rpm'=3600;0;0 '
> fan_3_system_board_fan_4_rpm'=3600
> ;0;0 'fan_4_system_board_fan_5_rpm'=3600;0;0 'pwr_mon_0_ps_1_current'=0.4;0;0 
> '
> pwr_mon_1_ps_2_current'=0.4
> ;0;0 'pwr_mon_2_system_board_system_level'=175;917;966 '
> temp_0_system_board_ambient'=20;42;47
> '
>
> OUTPUT with MS-DOS-Window (the highlighted signs are not in the output):
>
> C:\Programme\check_openmanage-3.6.5>check_openmanage.exe -p
> OK - System: 'PowerEdge R710 II', SN: 'XXX', 4 GB ram (2 dimms), 1 logical
> drives, 4 physical drives|'fan_0_system_board_fan_1_rpm'=3600;0;0 '
> fan_1_system_board_fan_2_rpm'=3600;0;0 'fan_2_system_board_fan_3_rpm'=3600;0;0
> 'fan_3_system_board_fan_4_rpm'=3600;0;0 
> 'fan_4_system_board_fan_5_rpm'=3600;0;0
> 'pwr_mon_0_ps_1_current'=0.4;0;0 'pwr_mon_1_ps_2_current'=0.4;0;0 '
> pwr_mon_2_system_board_system_level'=175;917;966 
> 'temp_0_system_board_ambient'=
> 20;42;47

Hi Dirk,

I'm having a hard time understanding what you mean. Perhaps my mail
client is playing tricks on me, but I can't see anything highlighted.
Except for weird line breaks the perfdata looks OK to me in both
examples.

Can you be more specific and pinpoint exactly where the problem is?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_openmanage-- Current probes not found

2011-02-24 Thread Trond Hasle Amundsen
Joe Beck  writes:

> Yes, just after sending this post I did the things you identified.
> Verifed model vs others where this issue was not happening
> We have several r610's & this is only one with the issue.
> Then I went & looked at the omsa version & found this one was running 5.9
> where the others had 6.4
> I removed & installed 6.4 but same result.
> I also had some question/confusion about best way to identify the version;
> in fact it may have already been running 6.4.
>
> I'm grep'ing for version; tried running cmds with -v & --version, etc but no
> luck in seeing which version via the cmds

This command will tell you which version of OMSA you're running:

  omreport about

There are other ways as well:

  
http://folk.uio.no/trondham/software/check_openmanage.html#how-can-i-find-out-which-version-of-omsa-my-server-is-running

I'm not sure if you understood my question about the servers being
identical. I didn't mean the model (I assumed the model would be the
same), but hardware-wise. Specifically, are they alike with respect to
number of power supplies?

In any case, the next step will be to examine the installed OMSA
software components. On RHEL and derivatives such as CentOS, you can do
this by comparing the output from 'rpm -qa|grep srvadmin' from healthy
boxes versus the failing one. Also check that the running OMSA services
are the same.

Since this is happening on only one server, and you have probably
installed OMSA in exactly the same way on all the servers, you may have
a real hardware problem. If all else fails, you should contact Dell
support and have them look at it.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_openmanage-- Current probes not found

2011-02-23 Thread Trond Hasle Amundsen
Joe Beck  writes:

> I have a couple R610’s
> Some run omreport chassis pwrmonitoring & return output
> I also have 1 that returns:
> # omreport chassis pwrmonitoring
> Power Consumption Information
>
> Error : Current probes not found
>
> Does this mean that this module just isn’t installed or ???
>
> At this point, do I just alter the nagios service to exclude pwrmonitoring?

Hi Joe,

I think the next point should be to investigate why OMSA behaves like
this. I've seen this error before, but on older servers with old OMSA
versions (5.4.0). A simple restart of OMSA (srvadmin-services.sh
restart) may be the solution and should be attempted first. The next
step would be to reinstall OMSA and verify that everything gets
installed.

Usually, if power monitoring information is not available, OMSA should
say something else and more informative.

Is the problematic machine identical to the ones that work?

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage SNMP Error

2011-02-17 Thread Trond Hasle Amundsen
Shawn Green  writes:

> I?m in the process of rolling out check_openmanage to monitor a variety of
> hardware including R510s, M600s, and M610s.  I?m running into an interesting
> issue where the alert is reporting back:
>
> SNMP ERROR [cooling]: The requested entries are empty or do not exist. 
>
> I understand this is an SNMP error (not check_openmanage), but what?s baffling
> me is how to work around it.  My Net::SNMP module is up to date (v6.0.1) as 
> are
> net-snmp packages on all hosts.
>
> A good majority of hosts that are getting this error are M600/M610 blades, yet
> other blades in the same chassis? do not get this error.  I?m also seeing 
> these
> on several R510s, yet other R510s have no problems. 
>
> All hosts are Centos 5.5 64 bit with OMSA 6.2.0.

Hi Shawn,

One thing that is really peculiar is that you're getting this error from
blade servers. The plugin should identify blades and ignore the fact
that they don't have cooling devices (i.e. fans). You should never get
this error from blades. Are you really sure that the error from your
blades are with cooling and not something else?

(If so, we'll need to investigate why the plugin doesn't identify the
blade servers correctly).

Your Net::SNMP version is fine and not to blame. The error lies with
OMSA and/or the SNMP service. Try running on the servers:

  omreport chassis fans

On the blades, you should get an error saying that no fan probes where
found, which is normal. But the R510s should display fan info. If they
don't, the problem is not SNMP related but with OMSA itself.

If you haven't already done so, try restarting OMSA (i.e. run
'srvadmin-services.sh restart') on the servers. Reinstalling OMSA (or
better yet: reinstall with version 6.4.0) is the logical next step. Make
sure that there are no errors during installation and that everything
gets installed.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage and linebreaks

2011-02-11 Thread Trond Hasle Amundsen
"Bryan O'Shea"  writes:

> check_openmanage and linebreaks not working in $SERVICEOUTPUT$  emails.
>
> When using the either of the following options the linebreaks seem to be 
> broken:
> -e or --postmsg
>
> This is what i get in my service notification emails instead of the
> desired output of seperate lines.
>
> "Power Supply 1 [AC] needs attention: Presence detected, Failure
> detected, AC lostbr/NOTE: PowerEdge 2950 III 437RQH1 -
> 555-1212"
>
> It puts a br/ in instead of a "\n".

Hi Bryan,

The default behaviour of check_openmanage is to use HTML linebreaks when
run from Nagios, NRPE etc., and regular linebreaks in a console which
has a TTY. The reason for this is that the plugin monitors several
things, and in case of multiple alerts it's practical to display them
each on a different line.

However, since this behaviour doesn't fit everyone you can modify it
with the '--linebreak' switch. To switch to regular (\n) linebreaks:

  check_openmanage --linebreak=REG

You can also specify any string as a custom linebreak:

  check_openmanage --linebreak=' -- '

If you choose regular linebreaks, the first line will be put in the
SERVICEOUTPUT macro, while any subsequent lines will be put in the
LONGSERVICEOUTPUT macro. This is how Nagios 3.x handles multiline output
from plugins.

PS. In order for the default HTML linebreaks to work as indended in the
web frontend, you should set "escape_html_tags=0" in the Nagios config.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: 'Amperage probe 0 [System Board System Level] reads 0 W'

2011-01-27 Thread Trond Hasle Amundsen
"Tom Sommer"  writes:

>>> After upgrading OpenManage to version 6.4.0 on a DELL R410,
>>> check_openmanage 3.6.4 returns
>>>
>>> CRITICAL: Amperage probe 0 [System Board System Level] reads 0 W
>>>
>>>
>>> Is this due to OpenManage changing behavior (bug), or is the hardware
>>> really faulty? (doubtful) :)
>
>> Most likely this is some sort of bug in OpenManage, or something went
>> wrong during upgrade. You should confirm the fault by running
>>
>> omreport chassis pwrmonitoring
>
> # omreport chassis pwrmonitoring
>
> Power Consumption Information is not available on this system because all
> the Power Supply units on your system do not support PMBus or the firmware
> on your system does not support power monitoring.

Strange.. if the system doesn't support power monitoring, the plugin
shouldn't complain about it. Are you using check_openmanage via SNMP or
locally?

(I'm guessing SNMP, and if so there are obvious inconsistencies
between what OMSA displays through omreport and what is available via
SNMP.)

Did power monitoring work at all before upgrading OMSA?

>>> Anyone else seen this?
>>
>> Sorry, no. Very often these problems are resolved simply by restarting
>> OpenManage on the monitored server, or a reboot. The next step is to
>> re-install OpenManage in case something was missed during install/upgrade.
>> If all else fails, contact Dell support.
>
> Tried all but the latter - guess it's a DELL bug.

I forgot one other possible cause: old BIOS and/or firmware. Newer
versions of OMSA often need relatively up-to-date BIOS and firmware
versions to function normally. You should upgrade all BIOS and firmware
on the server.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: "Amperage probe 0 [System Board System Level] reads 0 W"

2011-01-27 Thread Trond Hasle Amundsen
"Tom Sommer"  writes:

> After upgrading OpenManage to version 6.4.0 on a DELL R410,
> check_openmanage 3.6.4 returns
>
> CRITICAL: Amperage probe 0 [System Board System Level] reads 0 W
>
> Is this due to OpenManage changing behavior (bug), or is the hardware
> really faulty? (doubtful) :)

Hi Tom,

Most likely this is some sort of bug in OpenManage, or something went
wrong during upgrade. You should confirm the fault by running

  omreport chassis pwrmonitoring

Investigate the "Status" field. The only accepted value is "Ok".

> I know I could just disable amperage checks, but I'd like not to.
>
> Anyone else seen this?

Sorry, no. Very often these problems are resolved simply by restarting
OpenManage on the monitored server, or a reboot. The next step is to
re-install OpenManage in case something was missed during
install/upgrade. If all else fails, contact Dell support.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage showing 0 logical drives with OMSA 6.4 and PERC4

2011-01-25 Thread Trond Hasle Amundsen
Steve Jenkins  writes:

> On Tue, Jan 25, 2011 at 3:41 AM, Trond Hasle Amundsen
>  wrote:
>> Interesting.. OMSA is obviously aware of the logical drives, but what
>> does omreport actually say about them? Try running 'omreport storage
>> vdisk controller='.
>
> Looks like omreport sees the controller, but not the VDisk:
>
> # omreport storage vdisk controller=0
> No virtual disks found

Ok, so there is the reason that check_openmanage doesn't display any
virtual disks. It relies on OMSA for the information, specifically
omreport when used in local mode.

Based on the issue at hand and your reports about OMSA 6.4 and PERC4
controllers on the linux poweredge list, it seems that the latest OMSA
has serious issues with 8th gen Dell servers.

PS. You may have noticed that the plugin doesn't issue an alert when
virtual disks are missing. The reason for this is that it's perfectly
legal and plausible for systems to have no virtual disks. This is the
downside of a plugin that both discovers the components and monitors
them at the same time. It can't give alerts on missing components unless
they should always be present in all servers. A notable exception is
controllers, since being unable to display controllers is a common OMSA
problem. check_openmanage will complain about missing controllers even
though controller-less systems are possible.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage showing 0 logical drives with OMSA 6.4 and PERC4

2011-01-25 Thread Trond Hasle Amundsen
Steve Jenkins  writes:

> After upgrading three of the 1850s to Dell OMSA 6.4 today, I noticed
> something strange. The three of them now display in Nagios:
>
> OK - System: 'PowerEdge 1850', SN: '', 3 GB ram (6 dimms), 0
> logical drives, 2 physical drives
>
> OK - System: 'PowerEdge 1850', SN: 'XXX', 12 GB ram (6 dimms), 0
> logical drives, 2 physical drives
>
> OK - System: 'PowerEdge 1850', SN: 'XXX', 4 GB ram (6 dimms), 0
> logical drives, 2 physical drives
>
> All three display 0 logical drives, even though they all have a
> working RAID array.

[snip]

> The strange part is that OMSA 6.4 on the 1850s is clearly aware that
> there's a logical drive, because the GUI shows "Virtual Disk 0 RAID-1"
> in the Storage Dashboard.

Hi Steve,

Interesting.. OMSA is obviously aware of the logical drives, but what
does omreport actually say about them? Try running 'omreport storage
vdisk controller='.

You seem to be running check_openmanage in local mode, so the output
from omreport is what matters.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage

2011-01-24 Thread Trond Hasle Amundsen
Jeffrey Watts  writes:

> Thanks Trond!  That seems to have fixed it.  Here's what I see now:
>
> ./check_openmanage -H pkc-search28 -C tomgeco
> Power Supply 0 [AC] needs attention: Presence detected, Failure detected, AC 
> lost
> Voltage sensor 14 [PS 2 Voltage 2] is Unknown reading
>
> It comes up correctly now as a CRIT, too.

Good, thanks for reporting back. I'll include this fix in the next
release. The problem was that where the reading is not available, the
plugin assumes that the reading is discrete (i.e. not a number but
"good", "bad" etc.). This assumption is wrong in cases where the reading
is NOT discrete and simply not available via SNMP. The fixed version
will set the reading to "Unknown reading" when the reading can't be
obtained.

(However, this situation shouldn't occur at all if OMSA it behaving as
it should. Pulling the cable on one power supply would normally lead to
a reading of 0 volts for that voltage probe.)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Problem with check_openmanage

2011-01-24 Thread Trond Hasle Amundsen
Jeffrey Watts  writes:

> Hello, I'm using Mr. Amundsen's excellent check_openmanage plugin, and I'm
> getting an odd error:
>
> $ check_openmanage -H myserver -C public
> Power Supply 0 [AC] needs attention: Presence detected, Failure detected, AC
> lost
> Voltage sensor 14 [PS 2 Voltage 2] is 
> INTERNAL ERROR: Use of uninitialized value $reading in sprintf at /usr/lib/
> nagios/plugins/check_openmanage line 3565.
>
> Has anyone else seen this error?  I'm running version 3.6.4.  Please let me
> know what additional information is needed.

Hi Jeffrey,

This shouldn't happen, and I think I see where the problem is. Please
try the version available here, and let me know if it performs any
better:

  http://folk.uio.no/trondham/software/test/

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Check_OpenManage error

2011-01-14 Thread Trond Hasle Amundsen
"Jeffrey C. Veatch"  writes:

> To whom it may concern:
>  
> I have been trying to use check_openmanage in my Nagios configuration, but no
> matter what I do I get a list of Internal Errors at the end of the returned
> test.  The only way I can avoid it is by using the debug mode and only
> returning the first 80 lines.  This however does not warn me of any issues the
> server is having.
>  
> Here are some details.  The server running OMSA is an R710 running VMware ESX
> 4.0.0 Update 2.  OMSA version is 6.4.
> The nagios server is in a virtual machine running OpenSUSE 11.3.  The Nagios
> version is 3.2.3.
>  
> If there are other packages that you need to know the version, let me know. 
> The following is an example of the results that I get.  Oh, and in nagios this
> ends up being an unknown state for the check.
>  
> VLinux:/usr/local/nagios/libexec # ./check_openmanage -H 192.168.10.21
> OK - System: 'PowerEdge R710', SN: '5QTMZK1', 72 GB ram (18 dimms), 1 logical
> drives, 2 physical drives
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 588.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 655.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 708.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 764.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 869.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 952.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1028.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1103.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1168.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1325.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1531.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1549.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1563.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1577.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1591.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1613.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1633.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1653.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1674.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1702.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1737.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1846.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1968.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1973.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1978.
> INTERNAL ERROR: Use of :locked is deprecated at /usr/lib/perl5/vendor_perl/
> 5.12.1/Net/SNMP.pm line 1983.
> Thanks for any help you can give me.

Hi Jeffrey,

Interesting error, never seen this one before :)

check_openmanage will print any perl warnings that occur during
execution as internal errors. This is done to avoid situations where the
plugin stops working due to perl incompatibilities etc. without your
knowledge, as Nagios completely ignores any plugin output to STDERR.

Which version of Net::SNMP are you using? Try 'rpm -q perl-Net-SNMP' to
find out. Perl 5.12 deprecated the "locked" attribute, and this was
fixed in Net::SNMP version 6.0.1, i.e. the latest release. The changelog
for Net::SNMP 6.0.1 has the following:

  - Removed all occurrences of the "locked" attribute that was
deprecated in Perl 5.12.0.

I believe this to be a problem with your distribution using an
old/incompatible version of Net::SNMP. It seems that for perl 5.12.x you
need Net::SNMP 6.0.1 (or any later version).

PS. I found this in the OpenSUSE bugzilla:

  https://bugzilla.novell.com/show_bug.cgi?id=629698

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

---

Re: [Nagios-users] check_openmanage plugin reporting Firmware out of date

2010-12-13 Thread Trond Hasle Amundsen
Trond Hasle Amundsen  writes:

> "Surangiwala, Asif "  writes:
>
>> Can we update the check_openmanage script to parse the "Minimum
>> Required Firmware Version" and compare it with the current "Firmware
>> Version" to overcome the OMSA bug?
>
> It is entirely possible to mitigate this bug within the plugin, but I
> don't think that it's a good idea to let the plugin do all version
> parsings and ignore OMSA on a general basis. I have created a version
> that works around this particular bug (version 3.6.2-p1) and made it
> available here:
>
>   http://folk.uio.no/trondham/software/omsa-fw-bug/
>
> It simply ignores out-of-date firmware if the firmware and minimum
> firmware versions match those in question. But in order for this to
> work, I also had to turn off checking the global health status, which
> inherits the non-critical status of the controller.
>
> DISCLAIMER: This version is only intended as a temporary solution for
> users of OMSA 6.3.0 that struggles with the recent firmware bug, and
> don't want to use blacklisting as a workaround. When OMSA 6.4.0 becomes
> available, you should upgrade OMSA and revert to a regular release of
> check_openmanage.

Hi Asif,

Dell has released OMSA 6.4.0, which fixes the firmware version parsing
issue. I have also released a new version of check_openmanage that
contains a few compatibility fixes for OMSA 6.4.0:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage plugin reporting Firmware out of date

2010-12-01 Thread Trond Hasle Amundsen
"Surangiwala, Asif "  writes:

> Can we update the check_openmanage script to parse the "Minimum
> Required Firmware Version" and compare it with the current "Firmware
> Version" to overcome the OMSA bug?

It is entirely possible to mitigate this bug within the plugin, but I
don't think that it's a good idea to let the plugin do all version
parsings and ignore OMSA on a general basis. I have created a version
that works around this particular bug (version 3.6.2-p1) and made it
available here:

  http://folk.uio.no/trondham/software/omsa-fw-bug/

It simply ignores out-of-date firmware if the firmware and minimum
firmware versions match those in question. But in order for this to
work, I also had to turn off checking the global health status, which
inherits the non-critical status of the controller.

DISCLAIMER: This version is only intended as a temporary solution for
users of OMSA 6.3.0 that struggles with the recent firmware bug, and
don't want to use blacklisting as a workaround. When OMSA 6.4.0 becomes
available, you should upgrade OMSA and revert to a regular release of
check_openmanage.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage plugin reporting Firmware out of date

2010-12-01 Thread Trond Hasle Amundsen
"Surangiwala, Asif "  writes:

> I have Dell Open Manage Server Administrator 6.3.0 installed on some Dell
> R710’s with PERC H700 controller. When I run the Nagios plugin
> check_openmanage, it reports the following:
>
> Controller 0 [PERC H700 Integrated]: Firmware '12.10.0-0025' is out of date
>
> The H700 is running the latest firmware 12.10.0-0025, check_openmanage plugin
> is v3.6.2 by Trond H. Amundsen. OMSA is running fine and is not complaining
> about any firmware issues.
>
> The same ‘Firmware out of date’ warning is also given for H800 controllers on
> the R710’s having it.
>
> Is there an issue with the plugin’s interaction with OMSA?

Hi Asif,

This is a bug in OMSA, not check_openmanage. OMSA is reporting that the
firmware is too old while clearly it is not. Dell has stated that the
bug will be fixed in the next version of OMSA. For more information, see
the following thread on the Linux-Poweredge mailing list:

  http://lists.us.dell.com/pipermail/linux-poweredge/2010-December/043713.html

As a workaround, I suggest using blacklisting to suppress the false
warnings until OMSA 6.4.0 is released and deployed on your systems:

  check_openmanage -b ctrl_fw=all [..other options..]

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Check_OpenManage INTERNAL ERROR

2010-10-25 Thread Trond Hasle Amundsen
Benny Somali  writes:

> Ignore my previous question.

Too late, but no problem. My one-line patch is easily reversed :)

> It worked fine now.
> I used a batch script and didn't add a line to turn the echo off so it
> returned special characters.
> So I added @echo off and the Status Information displayed.

Good. Thanks again for reporting this and for testing the beta version.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_OpenManage INTERNAL ERROR

2010-10-25 Thread Trond Hasle Amundsen
Benny Somali  writes:

> Works fine now.

Good, thanks for testing.

> By the way, the Status Information field is blank, is it related to
> the max length of 1023 chars?

Probably not. You shouldn't run into problems with the silly nrpe limit
for other than large servers with lots of performance data, and then
only the perfdata should be affected.

My guess is that the "State" field is also empty for the failed
disk. I have an updated beta for you here:

  http://folk.uio.no/trondham/software/beta/

If should now report that the disk is "Unknown State".

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_OpenManage INTERNAL ERROR

2010-10-22 Thread Trond Hasle Amundsen
Benny Somali  writes:

> Yes, you are right.
> There is pdisk #1 that has empty vendor ID field.
> The disk in question was original Dell disk, however, it seemed to be
> bad now.
> We have an opened trouble ticket with Dell and expect to get a
> replacement disk.

Ah.. it makes sense that in some circumstances, if the disk is
sufficiently bad, Openmanage can't report the vendor.

I went ahead and patched this in the plugin. There is a beta version
(win32 binary) available here:

  http://folk.uio.no/trondham/software/beta/check_openmanage.exe

Please give it a try and let me know if it resolved this issue.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_OpenManage INTERNAL ERROR

2010-10-22 Thread Trond Hasle Amundsen
Benny Somali  writes:

> INTERNAL ERROR: substr outside of string at script/check_openmanage line 1502.
> INTERNAL ERROR: Use of uninitialized value in lc at script/check_openmanage 
> line 1502.

Hi Benny,

Thanks for reporting this. The error is related to the vendor of
physical disks as reported by omreport. What does 'omreport storage
pdisk controller=0' say? I'm guessing that the Vendor field is empty or
missing for one of the disks.

Finding the root cause would be interesting. Can you tell if the disk in
question is an original disk supplied by Dell? If it isn't, this could
be the reason that the vendor field is empty/missing, i.e. Openmanage
doesn't recognize it. If it is a Dell drive, we're probably dealing with
a rare Openmanage oddity.

In any case, check_openmanage should handle this situation more
gracefully. I'll provide a patched version for you to test on Monday.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Question on setting up my own check

2010-10-20 Thread Trond Hasle Amundsen
Marc Powell  writes:

> On Oct 19, 2010, at 2:20 PM, steve f wrote:
>
>> Hello All,
>> 
>> I have the following script created to check free space on a remote legacy 
>> box via rsh. 
>> 
>> used=`sudo rsh $1 df -v |grep starlite6 | head -1 | awk '{print $4}'`
>> free=`sudo rsh $1 df -v |grep starlite6 | head -1 | awk '{print $5}'`
>
> Beyond just good programming practice, always use full paths to external 
> programs within your scripts. $PATH may not be what you expect it to be, 
> especially when being run by the nagios daemon which has a more restrictive 
> environment.
>
> # (paths may be different on your system)
> used=`/usr/bin/sudo /usr/bin/rsh $1 /bin/df -v | /bin/grep starlite | 
> /usr/bin/head -1 | /usr/bin/awk '{print $4}'`

Or... set PATH before doing anything else, e.g.

  #!/bin/bash
  PATH=/bin:/sbin:/usr/bin:/usr/sbin
  export PATH
  [...rest of script...]

This will enhance readability wrt. using full paths everywhere.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Bug in check_openmanage ?

2010-09-24 Thread Trond Hasle Amundsen
rb...@free.fr writes:

> omreport chassis pwrmanagement
> Power Budget Information is not available on this system.
>
> In fact, i solve the problem by updating/resetting the idrac.

Ok, good to know. I'm still a little concerned that there was a hardware
problem that check_openmanage didn't identify properly. Please let me
know if this happens again.

> But the plugins nagios is always ko and i don't know why ...
>
> ./tmp/check_openmanage -H 10.1.19.193
>  SNMP ERROR [cooling]: Requested entries are empty or do not exist.

This is a completely different problem. Cooling devices (i.e. fans)
should exist in all servers except blades. Which type of server is this,
and do you know if it has fans or not?

The error above is from the Net::SNMP perl module. If the plugin doesn't
get the data it expects when polling via SNMP, it will forward the error
message from Net::SNMP.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Bug in check_openmanage ?

2010-09-23 Thread Trond Hasle Amundsen
rb...@free.fr writes:

> Hi Trond,
>
> You are right ...
>
> --
> # omreport chassis
>
> Health
>
> Main System Chassis
>
> SEVERITY : COMPONENT
> Ok   : Memory
> Critical : Power Management
> Ok   : Processors
> Ok   : Temperatures
> Ok   : Voltages
> Ok   : Hardware Log
> Ok   : Batteries
>
> For further help, type the command followed by -?
> 
>
> On the IDRAC i have the message "System Board Current Latch"

This is interesting.. Have you configured power budgeting on this
server? What does this command say:

  omreport chassis pwrmanagement

On a regular R805 here it just says:

  Power Budget Information is not available on this system.

but we've never configured or used this feature, so I don't know
anything about it.

I'm thinking that perhaps check_openmanage should support these and
similar configurable OMSA features.

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Bug in check_openmanage ?

2010-09-23 Thread Trond Hasle Amundsen
rb...@free.fr writes:

> OOPS! Something is wrong with this server, but I don't know what. The
> global system health status is CRITICAL, but every component check is
> OK. This may be a bug in the Nagios plugin, please file a bug report.
>
> The status change from OK to Unknown...
>
> Is anybody can help me to debbug ?

Hi Rémi,

Thanks for reporting this.

As an extra precaution, check_openmanage will check the global health
status in addition to each of the components, providing you don't use
blacklisting and/or check control such that the global check can be a
false positive.

This case seems to be a real issue where a component is bad and the
global health status reflects this. The component in question is not
checked by the plugin for some reason. I'd like to narrow down the
suspect pool. If you have login access to this server, can you send the
output from the following command:

  omreport chassis

If this command reports that everything is OK, we're probably dealing
with a storage problem.

Just to rule out blacklisting bugs etc., what is the command definition
for check_openmanage in your Nagios config?

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Problem with check_openmanage and Open Manage 6.3

2010-09-16 Thread Trond Hasle Amundsen
Luca Olivotto  writes:

> that is the output:
> SNMPv2-SMI::enterprises.674.10893.1.20.130.1 = No Such Object available on
> this agent at this OID

Ok, this confirms that the problem lies with OMSA, specifically the SNMP
functionality. I'm afraid that I can't offer much clues about how to fix
this. I would try restarting the OMSA and SNMP services, and if that
doesn't work, reinstall OMSA completely.

Best of luck,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage and Open Manage 6.3

2010-09-16 Thread Trond Hasle Amundsen
Luca Olivotto  writes:

> yes, i see the perc 6i controller.

Ok, thanks. I then suspect that the problem lies with the SNMP part of
OMSA. Kan you run the following command from your Nagios server to
confirm:

  snmpwalk -v2c -c   1.3.6.1.4.1.674.10893.1.20.130.1

The result should look something like this:

  $ snmpwalk -v2c -c public foobar 1.3.6.1.4.1.674.10893.1.20.130.1
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.1.1 = INTEGER: 1
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.2.1 = STRING: "PERC 6/i 
Integrated"
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.3.1 = STRING: "DELL"
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.4.1 = INTEGER: 6
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.5.1 = INTEGER: 1
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.7.1 = INTEGER: 30
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.8.1 = STRING: "6.2.0-0013"
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.9.1 = INTEGER: 256
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.10.1 = INTEGER: 0
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.11.1 = INTEGER: 6
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.12.1 = INTEGER: 2
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.37.1 = INTEGER: 3
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.38.1 = INTEGER: 3
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.39.1 = STRING: "\\0"
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.40.1 = INTEGER: 3
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.41.1 = STRING: "00.00.04.17-RH1
  "
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.42.1 = STRING: "embedded"
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.43.1 = INTEGER: 99
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.47.1 = INTEGER: 2
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.48.1 = INTEGER: 30
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.49.1 = INTEGER: 30
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.50.1 = INTEGER: 30
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.51.1 = INTEGER: 30
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.52.1 = INTEGER: 1
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.53.1 = INTEGER: 1
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.54.1 = INTEGER: 32
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.57.1 = INTEGER: 99
  SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.58.1 = INTEGER: 99

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage and Open Manage 6.3

2010-09-16 Thread Trond Hasle Amundsen
Luca Olivotto  writes:

> Hello,
> i have a problem with the plugin check_openmanage .
>
> if i use this command:
> ./check_openmanage -H xx.xx.xx.xx
>
> i get this result:
> OOPS! Something is wrong with this server, but I don't know what. The global
> system health status is WARNING, but every component check is OK. This may be
> a bug in the Nagios plugin, please file a bug report.
>
> The server that i'm checking is a PowerEdge 2950 and i suppose that the
> problem is the version of OpenManage installed on the server. The version is
> 6.3 and the only warning shown via the webinterface are  the old version of
> the firmware/driver/storeDriver of the controller.
> If i try that command
>
> check_openmanage -H 10.10.10.6 -b ctrl_fw=all/ctrl_driver=all/ctrl_stdr=all -s
> -e
> the output is:
> OK - System: 'PowerEdge 2950', SN: 'xx', 16 GB ram (4 dimms), 0 logical
> drives, 0 physical drives
>
> as you can see the disk are not checked(that server has a broked mirror).
>
> the version of check_openmanage is 3.6.0

Hi Luca,

Your analysis is correct. OMSA doesn't display storage info via SNMP,
but there is something wrong with a storage component. For some reason,
OMSA senses the storage failure and the global health status inherits
this failure status, but OMSA doesn't display the storage. This
condition will trigger the behaviour you are seeing.

The plugin searches for storage controllers. If it doesn't find any
controllers, it concludes that there is no storage alltogether and will
skip subsequent checks of disk drives etc.

Do you see any controlles by running this command on the server:

  omreport storage controller

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage ignores blacklist directive

2010-08-10 Thread Trond Hasle Amundsen
"C. Bensend"  writes:

>> Despite of giving it the parameter to ignore Warnings about the
>> controller firmware, it still gives a Warning Status:
>>
>> /usr/lib/nagios/plugins/check_openmanage -b ctrl_fw -s -H 192.168.2.137
>
> 'ctrl_fw' isn't the complete option you need to give there - you
> also need to specify the ID per:
>
> http://folk.uio.no/trondham/software/check_openmanage.8.html
>
> Try 'ctrl_fw=0,1'

Yes, or:

  ctrl_fw=all

..if you wish to blacklist this for all controllers and aren't
interested in specifying controller IDs.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Trond Hasle Amundsen
Max Williams  writes:

> Excellent, sorted, everything reports as OK now. 

Good. I'll try to make a release with these changes in the next couple
of days.

> Thanks so much Trond, amazing support and an amazingly useful plugin!

Glad you like it, Max. Thanks for reporting this issue :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Trond Hasle Amundsen
Max Williams  writes:

> Here is the output, the inactive temperature probe is sorted but the
> missing EMM still produces an alert:
>
>   OK |  1:1:0:1 | Temperature Probe 1 in enclosure 3 [MD1000] is Inactive

This one works as expected :)

>   OK |  1:1:0:2 | Temperature Probe 2 in enclosure 3 [MD1000]:  C ( max)
>   OK |  1:1:0:3 | Temperature Probe 3 in enclosure 3 [MD1000]:  C ( max)

Hmm... something strange going on here. I wonder why this happens, in
the SNMP output you attached previously the values are there. Anyway,
I've added some extra checking in the code to make it report better if
the reading is unavailable for some reason. It should now report simply:

  Temperature Probe 0 in enclosure 2:0:0 [MD1000] is Ready

if the temp reading is not an integer and OMSA reports the status as OK.

> CRITICAL |  1:1:0:1 | EMM 1 in enclosure 3 [MD1000] needs attention: Not 
> Installed

Ah.. I misread the SNMP output.. The status is "Unknown" when reported
by omreport, but "Other" when reported with SNMP. One little annoying
difference between the two.. The output should be:

  EMM 0 in enclosure 2:0:0 [MD1000] is Not Installed

with an OK state.

I've created a second test version:

  http://folk.uio.no/trondham/software/beta/check_openmanage

Please give this one a try and see if it performs better.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-25 Thread Trond Hasle Amundsen
Max Williams  writes:

> Both of the new enclosures show the same output so perhaps these just
> have a different configuration to the others we have here.

Yes. I suspect that the is related to one EMM not being installed. My
guess is that the inactive temperature sensor is located in the EMM, but
there is no way to tell since neither the omreport output nor the SNMP
output reveals the location of the temperature sensors. Or perhaps the
EMM is needed to activate the sensor. We always order our MD1000s with 2
EMMs, so this is something that I haven't had the opportunity to test.

I have created a test version for you to try. This version should:

  * report inactive temperature sensors as OK
  * report EMMs with state "Not Installed" as OK

In addition it checks that the reading from the sensors are in fact
digits before attempting to print the values.

The test version is located here:

  http://folk.uio.no/trondham/software/beta/

Try it with the '-d' option to see that it reports these things
properly.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-25 Thread Trond Hasle Amundsen
Max Williams  writes:

> Hi,
>
> After adding more storage to a couple of our servers we are getting this 
> error:
>
>  
>
> [r...@host  ~]# /usr/lib64/nagios/plugins/check_openmanage -C password -b
> ctrl_driver=0,1,2 -b ctrl_fw=0,1,2 -b intr=0 -H host2
>
> Temperature Probe 1 in enclosure 3 [MD1000] is Inactive C at  ( max)
>
> EMM 1 in enclosure 3 [MD1000] needs attention: Not Installed
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2312.
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2312.
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2318.
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2318.
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2318.
>
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/check_openmanage line 2318.
>
> [r...@host  ~]#
>
>  
>
> We didn?t get this error before adding a new cabinet of disks which now brings
> the total up to 47 (2x internal disk and 3x full MD1000s).
>
> Has any one else come across this error? I am not perl literate so not sure 
> how
> to debug or fix this.

Hi Max,

This is interesting. I've never seen "Inactive" temperature sensors in
external enclosures. Also, that the plugin reports missing EMMs seems
like a misfeature. Can you post the output from the following commands:

On the monitored host:

  omreport storage enclosure controller= enclosure= info=temps
  omreport storage enclosure controller= enclosure= info=emms

Replace  with controller/enclosure pairs. You'll get the
enclosure and controller IDs with commands

  omreport storage controller
  omreport storage enclosure

Also, since you're checking with SNMP, I'll need the output from an
snmpwalk of the enclosures wrt. temperatures and EMMs. From the Nagios
server:

  snmpwalk -v2c -c   1.3.6.1.4.1.674.10893.1.20.130.11
  snmpwalk -v2c -c   1.3.6.1.4.1.674.10893.1.20.130.13

If you are uncomfortable with posting this information on the
mailinglist, feel free to email me directly.

Debug output from the plugin could also be useful:

  check_openmanage -H  -C  -d

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage plugin error

2010-05-26 Thread Trond Hasle Amundsen
Andrea Ballarati  writes:

> Nagios reports error from the plugin in subject, we have another Dell
> PowerEdge 1950 for which no errors are reported.
> This is the output of check_openmanage -d
>
>System:  PowerEdge 1800
>ServiceTag:    OMSA version:4.5.0
>BIOS/date:   A05 09/21/2005   Plugin version:  3.5.7
> -
>Storage Components
>
> =
>   STATE  |ID|  MESSAGE TEXT
>
> -+--+
>  WARNING |0 | Controller 0 [CERC SATA 1.5/2s] needs attention:
> Degraded
>   OK |0:0:0 | Array Disk 0:0 [1.0TB] on ctrl 0 is Online
>   OK |0:0:1 | Array Disk 0:1 [1.0TB] on ctrl 0 is Online
>   OK |  0:0 | Logical drive 0 'Windows Disk 0' [RAID-1, 931.48
> GB] on ctrl 0 is Ready
>   OK |  0:0 | Channel 0 [] on controller 0 is Ready
> -
>Chassis Components
>
> =
>   STATE  |  ID  |  MESSAGE TEXT
>
> -+--+
>   OK |1 | Memory module 1 [DIMM1_A, 512 MB] is Ok
>   OK |2 | Memory module 2 [DIMM1_B, 512 MB] is Ok
>   OK |1 | Chassis fan 1 [BMC Fan 1]: 1500
>   OK |2 | Chassis fan 2 [BMC Fan 2]: 1500
>   OK |0 | Power Supply 0 [VRM]: Presence detected
>   OK |1 | Power Supply 1 [VRM]: Presence detected
>   OK |0 | Temperature Probe 0 [PROC_1 Temp] reads 38 C (max=120/125)
>   OK |1 | Temperature Probe 1 [BMC Ambient Temp] reads 22 C
> (min=8/3, max=40/45)
>   OK |2 | Temperature Probe 2 [BMC Planar Temp] reads 33 C
> (min=8/3, max=62/67)
>   OK |3 | Temperature Probe 3 [BMC VRD 0 Temp] reads 31 C
> (min=8/3, max=70/75)
>   OK |4 | Temperature Probe 4 [BMC VRD 1 Temp] reads 27 C
> (min=8/3, max=70/75)
>   OK |0 | Processor 0 [Intel Xeon 3.00GHz] is Present
>   OK |0 | Voltage sensor 0 [BMC CMOS Battery] is 3.070 V
>   OK |1 | Voltage sensor 1 [PROC_1 VCORE] is Good
>   OK |2 | Voltage sensor 2 [BMC PROC VTT] is Good
>   OK |3 | Voltage sensor 3 [BMC 1.5V PG] is Good
>   OK |4 | Voltage sensor 4 [BMC 1.8V PG] is Good
>   OK |5 | Voltage sensor 5 [BMC 3.3V PG] is Good
>   OK |6 | Voltage sensor 6 [BMC 5V PG] is Good
>   OK |0 | Chassis intrusion 0 detection: Ok (Not Breached)
> -
>Other messages
>
> =
>   STATE  |  MESSAGE TEXT
>
> -+---
>   OK | ESM log health is Ok (less than 80% full)
>
> INTERNAL ERROR: Use of uninitialized value in numeric eq (==) at
> /usr/lib/nagios/plugins/check_openmanage line 1380.
> INTERNAL ERROR: Use of uninitialized value in numeric eq (==) at
> /usr/lib/nagios/plugins/check_openmanage line 1380.
> INTERNAL ERROR: Use of uninitialized value in sprintf at
> /usr/lib/nagios/plugins

Hi Andrea,

check_openmanage is designed to work with relatively recent OMSA
versions. You are using OMSA version 4.5.0, which is very old. The
server in question (poweredge 1800) is supported by newer OMSA, so the
solution is an OMSA upgrade to the latest version (6.2.0).

OMSA versions 5.3.0 and later is OK to use with check_openmanage, and
I've had reports that 5.1.0 and 5.2.0 works as well (but no
guarantee). Anything older will yield strange results or will simply not
work.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--

___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage weirdness

2010-05-21 Thread Trond Hasle Amundsen
Greg Etling  writes:

> Trond, thanks for your quick reply. Unfortunately it does appear we have 
> a disconnect between OMSA and SNMP:

[snip]

> [r...@nagios ~]# snmpwalk -v2c -c * testserver 
> 1.3.6.1.4.1.674.10893.1.20.130.1
> SNMPv2-SMI::enterprises.674.10893.1.20.130.1 = No Such Object available 
> on this agent at this OID

Hmm.. you should see output like:

$ snmpwalk -v2c -c community hostname 1.3.6.1.4.1.674.10893.1.20.130.1
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.2.1 = STRING: "PERC 6/i 
Integrated"
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.3.1 = STRING: "DELL"
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.4.1 = INTEGER: 6
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.5.1 = INTEGER: 1
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.7.1 = INTEGER: 30
SNMPv2-SMI::enterprises.674.10893.1.20.130.1.1.8.1 = STRING: "6.2.0-0013"
[...]

> It appears to only have data under the 1.3.6.1.4.1.674.10892 and 
> 1.3.6.1.4.1.674.10899 trees. Thoughts?

Unfortunately my Windows knowledge is rather limited. I have never
installed OMSA on Windows, but I suspect that there are options to
choose from during the install. The first thing I would do is to
re-install OMSA step by step and try to figure out what I might have
missed. On Linux, the install procedure and packaging of the OMSA
components changed with version 6.2.0. This may very well be the case
with the Windows version as well.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--

___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage weirdness

2010-05-19 Thread Trond Hasle Amundsen
Greg Etling  writes:

> I have just started implementing some check_openmanage checks on my 
> servers, and have run into some odd behavior with the combination of 
> Windows 2003, OM 6.2 and the SNMP check. It appears that this 
> combination is having issues with the drive/controller reporting. 
> Initially things worked fine under OM 5.4, until the SNMP service would 
> die (other than that, Mrs. Lincoln...) - so i upgraded to OM 6.2, when I 
> observed the following behaviour.
>
> When the check is run without any blacklisting, the plugin reports that 
> there is a global status WARNING, but all components are OK - the 
> WARNING is coming from out of date Firmware/Driver versions as listed below:
>
> --
> Firmware/Driver Information for Controller PERC 6/i Integrated
> Firmware Version6.0.3-0002
> Minimum Required Firmware Version6.2.0-0012
> Driver Version2.14.00.32
> Minimum Required Driver Version2.23.00.32
> Storport Driver Version5.2.3790.3959
> Minimum Required Storport Driver Version5.2.3790.4173
> --
>
> Now when run in debug mode, I noticed that it had no information about 
> the drives at all (note the beta version - same output as plugin v3.5.7):

[snip]

This is the key to this problem. There are warnings associated with the
storage subsystem, but that information is not available via SNMP for
some reason. The global status of the server inherits these warnings,
however, so the plugin reports this as some unknown error.

Does omreport report anything on storage? Try:

  omreport storage controller

If that works, try getting the same information via SNMP:

  snmpwalk -v2c -c   1.3.6.1.4.1.674.10893.1.20.130.1

Usually the problem is that the storage components of OMSA is not
installed, in which case neither command will work.

> And the Status as reported to Nagios believes that there are no disks 
> whatsoever on the server:
> --
> OK - System: 'PowerEdge 2950', SN: 'XXX', hardware working fine, 0 
> logical drives, 0 physical drives
> --

Yes, that is the normal behaviour when the plugin doesn't find any
storage components. The plugin can't report this as a problem, since
it's OK for a server not to have storage reported by OMSA (which only
reports on supported storage), or any storage at all for that matter
(diskless servers).

> This has been replicated on several identical systems.
>
> I'm a bit stumped as to where the problem lies. Please let me know if 
> you need further information from me.

You should check your OMSA install. The storage parts of it was probably
not installed. It may also be that there is something wrong with the
OMSA+SNMP integration, which prevents storage information from being
presented. That would be trickier to debug.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--

___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Internal error

2010-04-13 Thread Trond Hasle Amundsen
Richard Hagen  writes:

> I recently installed a new DELL Poweredge 2970 with W2k8 and installed also
> DELL OMSA.
>
> When i read the status from nagios i get the following error:
>
> Amperage probe 0 [PS 1 Current 1] reads 0 A
> Amperage probe 1 [PS 2 Current 2] reads 0 A
> INTERNAL ERROR: Use of uninitialized value in division (/) at /usr/lib/nagios/
> plugins/check_openmanage line 3536.
> INTERNAL ERROR: Use of uninitialized value in division (/) at /usr/lib/nagios/
> plugins/check_openmanage line 3536.
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib/nagios/
> plugins/check_openmanage line 3562.

Hi Richard,

This happens because the value (i.e. reading from the amperage probes)
are not reported by SNMP, while the rest of the data about the probes
are reported (status, type, name etc.). There is something wrong with
Openmanage on this server. What is the output from this command:

  omreport chassis pwrmonitoring

That being said, the plugin could handle this better. Please try the
beta version available here:

  http://folk.uio.no/trondham/tmp/

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_multipath

2010-03-25 Thread Trond Hasle Amundsen
Brian O'Mahony  writes:

> It works locally though, and I have
>
>  
>
> Cmnd_Alias MULTIPATH=/sbin/multipath -l
>
> nagios  ALL= NOPASSWD: MULTIPATH

My money is on "Requiretty". Locally you have a TTY, while NRPE does
not. The "Requiretty" setting in /etc/sudoers must be turned
off. Comment out this line in /etc/sudoers:

  Defaultsrequiretty

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage 3.5.6

2010-03-22 Thread Trond Hasle Amundsen
Nicole Hähnel  writes:

>>> CRITICAL: [xxx] Physical Disk 0:0 [Wdc WD1600JS-55MHB0, 160GB] on ctrl 0 
>>> needs
>>> attention:
>>> -- SYSTEM:  PowerEdge 830, SN: xxx
>>> INTERNAL ERROR: Use of uninitialized value in string eq at 
>>> /usr/lib64/nagios/
>>> plugins/grontmij/check_openmanage line 1432.
>>> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
>>> plugins/grontmij/check_openmanage line 1445.

Mostly for the list archive:

We took this off the list to do some back-and-forth debugging and
testing, and the issue is now resolved. A new version of
check_openmanage is released, which will print the above correctly as:

  CRITICAL: Physical Disk 0:0 [Wdc WD1600JS-55MHB0, 160GB] on ctrl 0 needs 
attention: Undefined value 4096

This relates to SNMP returning values which are not defined in the
MIBs. Such values are now reported as "Undefined value ".

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Keeping the Nagios Configuration Sane

2010-03-10 Thread Trond Hasle Amundsen
David Wallis  writes:

> Matt Simmons wrote:
>> Hi All,
>>
>> I'm attending the 2010 Professional IT Community Conference
>> (http://www.picconf.org) being held in New Brunswick, NJ, and I'm
>> giving a talk about staying sane while working with the Nagios
>> configuration.
>>
>> The talk will be 45 minutes long, and will primarily be an outshoot
>> from this article that I wrote on my blog:
>> http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/
>>
>> I could talk about that and some other things that I've been figuring
>> out, but I was wondering if anyone had any tricks or tips for dealing
>> with the Nagios config? Is there anything special that you do to keep
>> things straight?
>>
>> I'm going to be putting my slides and any additional material online
>> following the conference, so hopefully someone else can get some use
>> from it.
>>
>> By the way, if anyone on this list is in the north east of the US, you
>> should come visit the conference. Without training, it's only $275 for
>> 2 days. With a full day and a half of training, it's still only $400
>> for the whole shebang. Anyway, this isn't a sales email.
>>
>> I'm looking forward to any tips you would want to share. Thanks in advance!
>>
>> --Matt
>>   
>
> I manage the Nagios installation for 3 different domains at work, each 
> domain with several hundred servers and clients. I quickly reached the 
> "There's got to be a better way!" point when trying to maintain 
> configuration files that were getting pretty big. I was using all the 
> tricks listed in the Nagios docs, but it was still pretty crazy.
>
> The approach I took was to write a configuration generator program that 
> uses a meta-config file to generate the hosts.cfg, hostgroups.cfg and 
> services.cfg config files. The meta-config file allows one to set up 
> cascading configuration variables, and then has one line per monitored 
> host, that includes things like host groups, parents, etc, and then a 
> list of services to monitor.
>
> I also created the idea of "meta-services" that allow the program to 
> generate configuration data for any number of related services with a 
> single service name in the meta-config file. For instance, including the 
> service "weball" will cause the configuration generator to create 
> service entries for every plumbed interface on the web server, checks 
> for every virtual server (http and https), and checks for every SSL cert 
> that it finds. In one domain, a 400 line meta-config file generates a 
> 20,000 line services.cfg file.
>
> Rather than updating individual config files, I just update the 
> meta-config file and then regenerate all of the *.cfg files. I've been 
> using this for several years with very good results.

That's an interesting approach, and we do something similar. It goes
without saying that when the number of hosts grows to several hundred,
maintaining the Nagios config for hosts and hostgroups etc. the regular
way becomes an arduous task. This is especially true if your environment
is largely heterogenous.

We have a list of our servers maintained in a homegrown application
using a topic map as base. Large parts of the Nagios config are
generated from this. I think this is an important point. Usually, you
already have a list of your servers, and you can use this list as a base
for Nagios config as well. The format of the host list is not important,
but deciding that this is the starting point for Nagios hosts config
is. When a host is added/removed in the list, it is added/removed in
Nagios. This is very much like David's approach, i.e. a list of hosts in
a format that is easier to handle and maintain.

In addition, we have defined several "roles" that a server may have,
such as dell-hardware, hp-hardware, mail-mx-server, web-server,
dns-server etc. A simple perl script runs every day on each host and
determines its roles. This information is collected and kept
centrally. Parts of the Nagios config (hostgroups, servicegroups) are
generated based on these roles.

NRPE config is the same on all hosts. It is maintained centrally and
distributed to each host daily. Adding stuff in the sudoers file (needed
for some plugins) is done automatically based on the host's roles.

Another point: We generally don't use plugins that require us to
configure the plugin and tailor it for each individual host. For
example, for filesystem monitoring we have created a custom plugin that
monitors all partitions by default. It has a optional configuration file
locally on each host where we can set individual thresholds if needed.

Thinking like this should come easy to system administrators that are
used to dealing with large installations. It's all about automation :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compilin

Re: [Nagios-users] check_openmanage: "Failed to load external entity" with OMSA v5.3.0 on 2003

2010-03-09 Thread Trond Hasle Amundsen
"C. Bensend"  writes:

>I am working my way through some more Windows NSClient++
> installs, and I've hit a 2003 server that is not happy with
> check_openmanage (please forgive the horrible formatting):
>
>
> C:\Program Files\NSClient++>check_openmanage -e -p -b
> bat_charge=ALL/ctrl_fw=ALL
> /ctrl_driver=ALL --omreport F:\dellopenmanage\oma\bin\omreport.exe
>
> I/O warning : failed to load external entity
> "E:/dellopenmanage/xslroot/oma/cli/
> omclpr.xsl"
> error
> xsltParseStylesheetFile : cannot parse
> E:/dellopenmanage/xslroot/oma/cli/omclpr.
> xsl
> I/O warning : failed to load external entity
> "E:/dellopenmanage/xslroot/oma/cli/
> omclpr.xsl"
> error
> xsltParseStylesheetFile : cannot parse
> E:/dellopenmanage/xslroot/oma/cli/omclpr.
> xsl
> I/O warning : failed to load external entity
> "E:/dellopenmanage/xslroot/oma/cli/
> omclpr.xsl"
> error
> xsltParseStylesheetFile : cannot parse
> E:/dellopenmanage/xslroot/oma/cli/omclpr.
> xsl
> Couldn't close filehandle for command
> '"F:\dellopenmanage\oma\bin\omreport.exe"
> -? 2>&1':
> Problem running 'omreport storage controller': Error! XML Transformation
> failed
> Problem running 'omreport chassis memory': Error! XML Transformation failed
> Problem running 'omreport chassis fans': Error! XML Transformation failed
> Problem running 'omreport chassis pwrsupplies': Error! XML Transformation
> failed
>
> Problem running 'omreport chassis temps': Error! XML Transformation failed
> Problem running 'omreport chassis processors': Error! XML Transformation
> failed
> Problem running 'omreport chassis volts': Error! XML Transformation failed
> Problem running 'omreport chassis batteries': Error! XML Transformation
> failed
> Problem running 'omreport chassis pwrmonitoring': Error! XML
> Transformation fail
> ed
> Couldn't close filehandle for command
> '"F:\dellopenmanage\oma\bin\omreport.exe"
> chassis pwrmonitoring -fmt ssv':
> Problem running 'omreport chassis intrusion': Error! XML Transformation
> failed
> Couldn't close filehandle for command
> '"F:\dellopenmanage\oma\bin\omreport.exe"
> system esmlog -fmt ssv':
> -- SYSTEM: N/A, SN: N/A
>
>
>*** Note that I specified F: for the path to OMSA, and the output
> is complaining about an external entity in E:.  That is not a typo.
>
>This is OMSA version 5.3.0 which I have dozens of, and this is
> the first time I've seen this.
>
>This host is a DC, so while I believe it's probably a case of
> "uninstall OMSA and re-install OMSA", I'm hoping someone has seen
> this before and I can avoid the reboot.
>
>Trond?  Anyone?

I've seen XML errors from OpenManage when probing disks with broken
firmware, but this seems to be something else. Do you get the same XML
errors when you run the failing commands manually on the server? If so,
I'm afraid there isn't much check_openmanage can do about it.

A reinstall of OpenManage is probably the next logical step..

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Problem with check_openmanage 3.5.6

2010-02-25 Thread Trond Hasle Amundsen
Nicole Hähnel  writes:

> I tested the new version:
>
> CRITICAL: [xxx] Physical Disk 0:0 [Wdc WD1600JS-55MHB0, 160GB] on ctrl 0 needs
> attention:
> -- SYSTEM:  PowerEdge 830, SN: xxx
> INTERNAL ERROR: Use of uninitialized value in string eq at /usr/lib64/nagios/
> plugins/grontmij/check_openmanage line 1432.
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/grontmij/check_openmanage line 1445.

Hmm.. OK, new test:

  http://folk.uio.no/trondham/tmp/check_openmanage-3.5.7-beta2

Regards,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Problem with check_openmanage 3.5.6

2010-02-25 Thread Trond Hasle Amundsen
Nicole Hähnel  writes:

> it's a windows server.
> So I'm using check_openmanage with snmp.
>
> check_openmanage -s -C $ARG1$ -H $HOSTADDRESS$ -e -i -p --state --check 
> intrusion=1,alertlog=1,esmlog=1 -o 3 --htmlinfo de
>
> List of Physical Disks on Controller CERC SATA 1.5/6ch (Slot 4)
>
> Controller CERC SATA 1.5/6ch (Slot 4)
> ID: 0:0
> Status: Unknown
> Name  : Physical Disk 0:0
> State : Unknown
> Failure Predicted : No
> Progress  : Not Applicable
> Bus Protocol  : SATA
> Media : HDD
> Capacity  : 149.05 GB (160040681472 bytes)
> Used RAID Disk Space  : 0.00 GB (0 bytes)
> Available RAID Disk Space : 0.00 GB (0 bytes)
> Hot Spare : No
> Vendor ID : WDC
> Product ID: WD1600JS-55MHB0
> Revision  : 02.0
> Serial No.:  WD-WCANM3083963
> Negotiated Speed  : Not Available
> Capable Speed : Not Available
> Manufacture Day   : Not Available
> Manufacture Week  : Not Available
> Manufacture Year  : Not Available
> SAS Address   : Not Available

Ok, so the status and state are both "Unknown". I'm guessing that these
values are completely missing in the SNMP output, which is why perl
chokes on it. I've added some robustness in the code that should handle
this case properly. Please try the beta version (3.5.7-beta1) available
here:

  http://folk.uio.no/trondham/tmp/check_openmanage-3.5.7-beta1

The plugin will give an alert on the drive, which in my opinion is the
correct thing to do. You can always blacklist the drive. The cause of
the error is obviously that this is a non-Dell drive, which Openmanage
doesn't know how to handle.

BTW, you can reduce your command definition to this:

  check_openmanage -s -C $ARG1$ -H $HOSTADDRESS$ -e -i -p -a -o 3 --htmlinfo de

The effect will be the same. You probably defined the command a while
ago, and there have been some changes to options since then.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Problem with check_openmanage 3.5.6

2010-02-24 Thread Trond Hasle Amundsen
Nicole Hähnel  writes:

> Hi
>
> I get this message on one pe830 (OM 6.1.0) :
>
> CRITICAL: [ xxx] Physical Disk 0:0 [Wdc WD1600JS-55MHB0, 160GB] on ctrl 0 
> needs
> attention:
> -- SYSTEM: PowerEdge 830, SN: xxx
> INTERNAL ERROR: Use of uninitialized value in string eq at /usr/lib64/nagios/
> plugins/grontmij/check_openmanage line 1428.
> INTERNAL ERROR: Use of uninitialized value in sprintf at /usr/lib64/nagios/
> plugins/grontmij/check_openmanage line 1441.
>
>
> Is this a problem of check_openmanage or the disk?
> It's a non dell sata disk.

Hi Nicole,

Can you provide the output of the following command, executed on the
monitored host:

  omreport storage pdisk controller=0

Also, are you using check_openmanage in SNMP or local context?

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage and net-snmp v3

2010-02-23 Thread Trond Hasle Amundsen

Hi all,

Just to bring this thread to a conclusion... I have released a new
version of check_openmanage that adds a new option '--use-get_table',
which is to be used as a workaround for issues with SNMPv3 on Windows
using net-snmp. There are a few other minor fixes and feature
enhancements as well.

Downloads and changelog:

  http://folk.uio.no/trondham/software/check_openmanage.html#download

(Also available on Nagios Exchange and Monitoring Exchange.)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage and net-snmp v3

2010-02-15 Thread Trond Hasle Amundsen
"Verhaeghe, Koen"  writes:

> The script is working, at least, it does not give any errors anymore.
> I even get "Physical Disk 0:1 [Ata WDC WD800JD-75MSA3, 0GB] on ctrl 0
> needs attention: Failure Predicted" as expected. I was expecting also an
> errormessage from the Virtual disks, as they are degraded, but that's
> not there.

If the error is just "Failure Predicted", it means that the disk is
working fine for the time being and the virtual drive status is not
affected. When/if the drive eventually fails the virtual drive will be
degraded.

> Moreover, I know some of our servers have problems with power supplies
> or memory, so I changed a section in the below mentioned script like you
> did for the disks and others, just to test:
>
>   #my $result = $snmp_session->get_entries(-columns => [keys
> %ps_oid]);
>   
> 
> ##
>   # SNMPv3 test
>   
> 
> ##
>   my $result = q{};
>   if ($opt{protocol} == 3) {
>   my $powerDeviceTable = '1.3.6.1.4.1.674.10892.1.600.12.1';
>   $result = $snmp_session->get_table(-baseoid =>
> $powerDeviceTable);
>   }
>   else {
>   $result = $snmp_session->get_entries(-columns => [keys
> %ps_oid]);
>   }
>   
> 
> ##
>   
> 
> ##
>
> And now I do get the expected error:
> "Power Supply 1 [AC] needs attention: Presence detected, Failure
> detected, AC lost"
>
> I think it is safe to say that, when using net-snmp v3, the get_entries
> method is not giving the expected result.

The complete picture is still a little unclear to me. Do these problems
occur only when you use net-snmp instead of Windows' native snmp agent?
(I'm assuming that "net-snmp" refers to
http://freshmeat.net/projects/net-snmp).

I would be interested in any test results you might have using the
native Windows snmp agent with SNMPv3.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


  1   2   >