[Nagios-users] IP and hostname mapping control

2010-06-28 Thread Network Operation Center FMC Luxemburg

Hi everybody,

I'm looking for a way to check out the mapping between a hostname and IP 
address.


Example: IP 192.168.0.1 exists and if the hostname foo.mylan.com is not 
associated with this IP, I would have an alarm.


Indeed the script below returns no alarm :

define host {
use unix-server
host_name   foo.mylan.com
display_namefoo
address 192.168.0.1
check_command   check_http
}

define service {
uselocal-service
host_name  foo.mylan.com
service_description   HTTP local
check_command  check_http
}

Any idea?

Thanks a lot

François
--
Network Operation Center
LUXEMBURG
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Additional states in Nagios

2010-06-28 Thread Kevin Keane
Actually, there are four states reported by plugins: OK, WARNING, CRITICAL and 
UNKNOWN. Services will have the same four states.

There are also three states that hosts can have: UP, DOWN, UNREACHABLE. UP, 
DOWN and unreachable depends on the state reported by the plugin, as well as 
the state of parents. http://nagios.sourceforge.net/docs/3_0/hostchecks.html

HARD and SOFT states are separate from all of that. You can have a soft warning 
or a hard warning, and a soft critical or a hard critical. 
http://nagios.sourceforge.net/docs/3_0/statetypes.html

OK, WARNING, CRITICAL and UNKNOWN are the actual state of whatever you are 
monitoring. The plugins decide which state it is. HARD, SOFT, as well as UP or 
DOWN, are computed by Nagios based on the status reported by the plugins. 
Exactly how Nagios does that is configurable.

-Original Message-
From: Jason W. [mailto:jwellb...@gmail.com] 
Sent: Monday, June 28, 2010 7:18 PM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Additional states in Nagios

(I've tried Googling for the answer, but there seems to be some ambiguity in 
defining terms - even in the Nagios docs)

I've got Nagios monitoring a bunch of things on our servers and I also have 
events being sent to Nagios via passive checks. This is all useful information 
to us as sysadmins, but there is a difference in criticality, e.g. is is down, 
is it about to go down, or is it purely informational?

The latter is what I am writing about. Currently, there are two "states" we use 
- WARNING and CRITICAL. This is the ambiguous part since the docs refer to 
states as HARD or SOFT, but the plugin API docs refer to WARNING and CRITICAL 
as states. I realize there is also UNKNOWN, but with non-technical people 
occasionally looking at our Nagios, that may lead them astray...

Is there a way to get more states, e.g. INFORMATION?  This would allow one to 
sort by state in the web interface. Currently, we use WARNING for most 
informational messages, so there is a mashup of "Service X is about to die" and 
"Server Y did something you may want to know about"

I am guessing not without hacking the source, but I can dream ;)

Thoughts & comments appreciated - even if it's to say I'm Doing it Wrong.

--
HTH, YMMV, HANW :)

Jason

The path to enlightenment is /usr/bin/enlightenment.

--
This SF.net email is sponsored by Sprint What will you do first with EVO, the 
first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Additional states in Nagios

2010-06-28 Thread Jason W.
(I've tried Googling for the answer, but there seems to be some
ambiguity in defining terms - even in the Nagios docs)

I've got Nagios monitoring a bunch of things on our servers and I also
have events being sent to Nagios via passive checks. This is all
useful information to us as sysadmins, but there is a difference in
criticality, e.g. is is down, is it about to go down, or is it purely
informational?

The latter is what I am writing about. Currently, there are two
"states" we use - WARNING and CRITICAL. This is the ambiguous part
since the docs refer to states as HARD or SOFT, but the plugin API
docs refer to WARNING and CRITICAL as states. I realize there is also
UNKNOWN, but with non-technical people occasionally looking at our
Nagios, that may lead them astray...

Is there a way to get more states, e.g. INFORMATION?  This would allow
one to sort by state in the web interface. Currently, we use WARNING
for most informational messages, so there is a mashup of "Service X is
about to die" and "Server Y did something you may want to know about"

I am guessing not without hacking the source, but I can dream ;)

Thoughts & comments appreciated - even if it's to say I'm Doing it Wrong.

-- 
HTH, YMMV, HANW :)

Jason

The path to enlightenment is /usr/bin/enlightenment.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] [op5-users] Cannot get pnp4nagios graphs using NSClient++ & check_nrpe ?

2010-06-28 Thread Mirza Dedic
As far as I am aware, the perfdata is correct?

'F:'=194.62G;20;12;0;399.99;
||  | |  | | |
||--|-|--|-|-|- * label
 |--|-|--|-|-|- * current value
|-|--|-|-|- unit ( UOM = UNIT of Measurement )
  |--|-|-|- warning threshold
 |-|-|- critical threshold
   |-|- minimum value
 |- maximum value
From: Mirza Dedic
Sent: June/28/2010 5:37 PM
To: 'Nagios Users List'
Subject: RE: [op5-users] Cannot get pnp4nagios graphs using NSClient++ & 
check_nrpe ?

Checking disk space using NSClient++ (NRPE) and check_nrpe (2.12), trying to 
get RRD graphs (pnp4nagios):

Configuration:

NSClient++ v.0.3.8.76 (2010-05-27 x64)
Check_nrpe 2.12
Nagios
Merlin 0.6.7-beta2sp1
Nagios 3.2.1
Ninja 1.0
PNP 0.4.14
RRDTool 1.2.19

Error log in perfdata.log:

2010-06-28 16:10:39 [16465] [1] process_perfdata.pl-0.4.14 starting in DEFAULT 
Mode
2010-06-28 16:10:39 [16465] [1] Found Performance Data for van-mail01 / DISK__F 
('F: %'=52%;5;3; 'F:'=194.62G;20;12;0;399.99;)
2010-06-28 16:10:39 [16465] [1] Invalid Perfdata detected
2010-06-28 16:10:39 [16465] [1] PNP exiting (runtime 0.00175s) ...

Perfdata has been enabled in nagios.cfg

process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file_mode=a
service_perfdata_file_mode=a

Also, my commands for perfdata:

define command {
  command_nameprocess-service-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl
}

define command {
  command_nameprocess-host-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl 
-d HOSTPERFDATA
}

Finally, the command configuration for the service:

define command{
command_name check_nrpe_disk
command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -u -n -p 
X -t 30 -c CheckDriveSize -a ShowAll MinWarnFree=$ARG2$ MinCritFree=$ARG3$ 
Drive="$ARG1$"
}

I recently switched from check_nt to check_nrpe, and I removed my .xml and .rrd 
files from the service checks that got switched from nt to nrpe.

Shouldn't the pnp4nagios use the default.php to create the perfdata output? I 
don't see how the performance data is invalid? I can confirm that the RRD 
graphs work for other services, such as ping.

Any help would be appreciated.

The Oppenheimer Group  CONFIDENTIAL



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.



The Oppenheimer Group  CONFIDENTIAL

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] how to fix excessive latency

2010-06-28 Thread wwanghongrui
Thanks your reply. We are writing to mysql database by ndoutils.We don't use 
nsca. About external_command_buffer_slots, we don't set it up. 
status_update_interval =15 

I use vmstate to capture system performance,like below.Maybe the bottleneck is 
not at system.

procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 1  0160 239708 289248 603192400 12900  2  3 94  1  0
 1  0160 242168 289248 603192400 0 0  260 1023  0  6 94  0  0
 1  0160 246912 289248 603192400 0   392  291 1044  0  6 93  1  0
 1  0160 246696 289248 603192400 0   100  265 1056  0  6 93  0  0
 2  0160 243604 289248 603500800  4668 0  598 1324  1  7 91  1  0
 1  0160 245276 289248 60350080032 0  265 1403  0  6 93  0  0
 1  0160 245268 289248 603500800 0 0  253 1187  0  6 94  0  0
 1  1160 245548 289248 603500800 0  4728  887 1759  0  6 88  5  0
 1  1160 246288 289248 603603600 0  1740 1065 1103  1  6 87  6  0
 0  1160 247368 289248 603603600 0  1720 1086 2252  1  3 90  6  0
 0  0160 247492 289248 603603600 0   980  984  539  4  0 90  6  0
 0  0160 247624 289248 603603600 0 0  254  330  0  0 100  0 
 0
 0  0160 247624 289248 603603600 0  5420  622  342  0  0 97  3  0
 0  0160 247844 289248 603603600 0 0  254  312  0  0 100  0 
 0
 0  0160 247844 289248 603603600 0 0  254  317  0  0 100  0 
 0
 0  0160 247984 289248 603603600 0 0  254  313  0  0 100  0 
 0
 0  0160 247984 289248 603603600 0 0  254  315  0  0 100  0 
 0
 0  0160 248260 289248 603603600 0   352  362  317  0  0 99  1  0
 0  0160 248260 289248 603603600 0 0  306  303  0  0 100  0 
 0
 1  0160 248876 289248 603603600 0   100  270  367  0  0 99  0  0
 5  0160 233840 289248 603603600 0 0  341 1490  6  8 86  0  0
 5  0160 187468 289248 603603600 0 4  866 2736  9 22 69  0  0
 4  1160 171508 289248 603603600 0  5352  837 2205  3 20 76  1  0
procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 4  0160 175172 289248 603603600 0   568  453 2091  1 15 83  0  0
 3  0160 154108 289248 603603600 0 0  427 3456  1 20 79  0  0
 5  0160 125684 289248 603603600 0 4  469 2620  1 19 80  0  0
 9  0160 146712 289248 603603600 0 0  603 2272  4 26 70  0  0
 6  0160 168804 289248 603603600 0 0  668 2784  9 27 64  0  0
 4  0160 181032 289248 603603600 0  1164  736 2654  4 25 70  1  0
 1  0160 210728 289248 603603600 0 0  465 2152  5 19 76  0  0
 1  0160 211216 289248 603603600 0 0  294  837  0  6 94  0  0
 1  0160 216644 289248 603603600 0 0  293  954  0  7 93  0  0
 1  0160 227320 289248 603603600 0 0  285  943  0  8 92  0  0
 1  0160 238864 289248 603603600 0   576  343 2308  1  8 91  1  0
 1  2160 233660 289248 603912000  2252   100  393 1046  1  6 92  1  0
 1  0160 239548 289248 603912000   984  3316  571 1055  1  6 92  1  0
 1  0160 240084 289248 603912000 0 0  253  998  0  6 94  0  0
 1  0160 239968 289248 603912000 0 0  253  990  0  6 93  0  0
 1  1160 240388 289248 603912000 0  1956  781   0  6 89  4  0
 1  1160 240256 289248 603912000 0  1828 1088 1452  1  6 87  6  0
 1  2160 239648 289248 603912000 0  1620 1038 1614  1  6 87  6  0
 1  1160 240028 289248 603912000 0  1700 1065 1459  0  6 85  9  0
 1  1160 239912 289248 603912000 0  2512 1211 1623  0  6 87  6  0
 1  1160 240648 289248 603912000 4  2880 1380 1128  0  5 87  7  0
 1  0160 241124 289248 603912000 084  499 1024  0  6 93  0  0
 1  0160 241000 289248 603912000 0   296  287 1757  1  6 93  1  0
procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 3  0160 241808 289248 603912000 0 0  253 1630  1  6 93  0  0
 1  0160 241800 289248 603912000 0 0  253  977  0  6 94  0  0
 1  0160 241880 289248 603912000 0 0  253  989  0  6 94  0  0
 3  0160 218192 289248 603912000 0   100  350 1810  3 14 83  0  0
 4  0160 181560 289248 603912000 0  5792  957 2948  6 21 72  1  0
 6  0160 182036 289248 604014800 0  

Re: [Nagios-users] [op5-users] Cannot get pnp4nagios graphs using NSClient++ & check_nrpe ?

2010-06-28 Thread Mirza Dedic
Checking disk space using NSClient++ (NRPE) and check_nrpe (2.12), trying to 
get RRD graphs (pnp4nagios):

Configuration:

NSClient++ v.0.3.8.76 (2010-05-27 x64)
Check_nrpe 2.12
Nagios
Merlin 0.6.7-beta2sp1
Nagios 3.2.1
Ninja 1.0
PNP 0.4.14
RRDTool 1.2.19

Error log in perfdata.log:

2010-06-28 16:10:39 [16465] [1] process_perfdata.pl-0.4.14 starting in DEFAULT 
Mode
2010-06-28 16:10:39 [16465] [1] Found Performance Data for van-mail01 / DISK__F 
('F: %'=52%;5;3; 'F:'=194.62G;20;12;0;399.99;)
2010-06-28 16:10:39 [16465] [1] Invalid Perfdata detected
2010-06-28 16:10:39 [16465] [1] PNP exiting (runtime 0.00175s) ...

Perfdata has been enabled in nagios.cfg

process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file_mode=a
service_perfdata_file_mode=a

Also, my commands for perfdata:

define command {
  command_nameprocess-service-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl
}

define command {
  command_nameprocess-host-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl 
-d HOSTPERFDATA
}

Finally, the command configuration for the service:

define command{
command_name check_nrpe_disk
command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -u -n -p 
X -t 30 -c CheckDriveSize -a ShowAll MinWarnFree=$ARG2$ MinCritFree=$ARG3$ 
Drive="$ARG1$"
}

I recently switched from check_nt to check_nrpe, and I removed my .xml and .rrd 
files from the service checks that got switched from nt to nrpe.

Shouldn't the pnp4nagios use the default.php to create the perfdata output? I 
don't see how the performance data is invalid? I can confirm that the RRD 
graphs work for other services, such as ping.

Any help would be appreciated.

The Oppenheimer Group  CONFIDENTIAL



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.



The Oppenheimer Group  CONFIDENTIAL

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Assign contact_group to a host without notifications

2010-06-28 Thread Matthew Angelo
Hi Nagios Users,

We have a super modular config.  Essentially [almost] all Service Checks are
defined to HostGroups, and then Hosts merely assign themselve to that
HostGroup.


#
#
# HostGroup {
# LINUX_SERVER
# check_cpu
# check_memory
# check_disk
# }
#
#
# Host {
# use TEAM1
# name MY_LINUXSERVER1
# hostgroup LINUX_SERVER
# }
#
#

"use TEAM1" is a Host Template definition which defines contact_group and
notification period.


How do I expand on this to allow another team (contact_group) read-only
access or visibility into the Host service checks for MY_LINUXSERVER1.
*without* notifiying them?

I added:

contact_groups  +TEAM2

to the host definition.  However it is now also *alerting* to TEAM2 which I
don't want.


Think of TEAM1 as "LINUX team" and TEAM2 as the Application team which want
visibility into a server, but not be alerted if disk space starts filling up
on the Server itself.


Thanks
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] Cannot get pnp4nagios graphs using NSClient++ & check_nrpe ?

2010-06-28 Thread Mirza Dedic
Checking disk space using NSClient++ (NRPE) and check_nrpe (2.12), trying to 
get RRD graphs (pnp4nagios):

Configuration:

NSClient++ v.0.3.8.76 (2010-05-27 x64)
Check_nrpe 2.12
Nagios
Merlin 0.6.7-beta2sp1
Nagios 3.2.1
Ninja 1.0
PNP 0.4.14
RRDTool 1.2.19

Error log in perfdata.log:

2010-06-28 16:10:39 [16465] [1] process_perfdata.pl-0.4.14 starting in DEFAULT 
Mode
2010-06-28 16:10:39 [16465] [1] Found Performance Data for van-mail01 / DISK__F 
('F: %'=52%;5;3; 'F:'=194.62G;20;12;0;399.99;)
2010-06-28 16:10:39 [16465] [1] Invalid Perfdata detected
2010-06-28 16:10:39 [16465] [1] PNP exiting (runtime 0.00175s) ...

Perfdata has been enabled in nagios.cfg

process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
host_perfdata_file_mode=a
service_perfdata_file_mode=a

Also, my commands for perfdata:

define command {
  command_nameprocess-service-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl
}

define command {
  command_nameprocess-host-perfdata
  command_line/usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl 
-d HOSTPERFDATA
}

Finally, the command configuration for the service:

define command{
command_name check_nrpe_disk
command_line  $USER1$/check_nrpe -H $HOSTADDRESS$ -u -n -p 
X -t 30 -c CheckDriveSize -a ShowAll MinWarnFree=$ARG2$ MinCritFree=$ARG3$ 
Drive="$ARG1$"
}

I recently switched from check_nt to check_nrpe, and I removed my .xml and .rrd 
files from the service checks that got switched from nt to nrpe.

Shouldn't the pnp4nagios use the default.php to create the perfdata output? I 
don't see how the performance data is invalid? I can confirm that the RRD 
graphs work for other services, such as ping.

Any help would be appreciated.

The Oppenheimer Group  CONFIDENTIAL

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] how to fix excessive latency

2010-06-28 Thread shadih rahman
There is something definitely not right here.  We have about 1 checks
and the performance is lot better.  Anyhow we are using the following values
check_result_reaper_frequency=10
max_check_result_reaper_time=20


You should enabled debug mode and check the debug logs.  Are you writing to
any backend database?  Are you using nsca to transfer service information to
remote location.  what is the value of your status_update_interval?  what is
your external_command_buffer_slots?



2010/6/28 wwanghongrui 

>  Hi,guys~
>
> Our nagios server envrionment: Nagios3.2.0 + Suse10-sp2 x86_64 + 8 GB mem +
> 4 x ( Xeon(R) CPU  E7420  @ 2.13GHz )
> We have 500+ active check hosts and 3k+ active check services.  I have
> adjust some perfomance parameters in nagios.cfg, like below:
>  use_large_installation_tweaks=1
> child_processes_fork_twice=0
> enable_environment_macros=0
> check_result_reaper_frequency=5
> max_check_result_reaper_time=30
>
> But, The nagios performance is still bad, like below:
>
>Services Actively Checked:
> Time Frame Services Checked <= 1 minute: 271 (9.4%) <= 5 minutes: 1749
> (60.4%) <= 15 minutes: 2824 (97.4%) <= 1 hour: 2898 (100.0%) Since program
> start:   2869 (99.0%) Metric Min. Max. Average Check Execution Time:   
> 0.09
> sec 32.23 sec 1.113 sec Check Latency: 1.12 sec 212.59 sec 116.329 sec Percent
> State Change: 0.00% 23.88% 0.05%
> Hosts Acrively Checked:
> Time Frame Hosts Checked <= 1 minute: 32 (5.5%) <= 5 minutes: 419
> (71.5%) <= 15 minutes: 586 (100.0%) <= 1 hour: 586 (100.0%) Since program
> start:   586 (100.0%) Metric Min. Max. Average Check Execution Time:   
> 0.08
> sec 4.29 sec 3.035 sec Check Latency: 0.00 sec 135.25 sec 116.420 sec Percent
> State Change: 0.00% 11.32% 0.09%
>
>  How could I find which services check or hosts check cause this seriously
> check latency?
>
>
> Regards
>
> HongRui Wang
> mail: wwanghong...@cebbank.com
> 2010-06-28
>
>
>
>
> --
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
Cordially,
Shadhin Rahman
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Trond Hasle Amundsen
Max Williams  writes:

> Excellent, sorted, everything reports as OK now. 

Good. I'll try to make a release with these changes in the next couple
of days.

> Thanks so much Trond, amazing support and an amazingly useful plugin!

Glad you like it, Max. Thanks for reporting this issue :)

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] wiki down?

2010-06-28 Thread Matt Simmons
Bah! If you don't have an event handler that fences the misbehaving
machine at the first sign of trouble, you're not trying hard enough
;-)


On Mon, Jun 28, 2010 at 8:34 AM, Max  wrote:
> On Mon, Jun 28, 2010 at 8:29 AM, Matt Simmons
>  wrote:
>> If only there were some kind of software available to let us know when
>> websites were down...
>
> Or people to respond to alerts from the software :)
>
> --
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Max Williams
Excellent, sorted, everything reports as OK now. 
Thanks so much Trond, amazing support and an amazingly useful plugin!
Best Regards,
Max Williams

-Original Message-
From: Trond Hasle Amundsen [mailto:t.h.amund...@usit.uio.no] 
Sent: 28 June 2010 15:21
To: Nagios Users List
Subject: Re: [Nagios-users] check_openmanage: Use of uninitialized value in 
sprintf at /usr/lib64/nagios/plugins/check_openmanage

Max Williams  writes:

> Here is the output, the inactive temperature probe is sorted but the
> missing EMM still produces an alert:
>
>   OK |  1:1:0:1 | Temperature Probe 1 in enclosure 3 [MD1000] is Inactive

This one works as expected :)

>   OK |  1:1:0:2 | Temperature Probe 2 in enclosure 3 [MD1000]:  C ( max)
>   OK |  1:1:0:3 | Temperature Probe 3 in enclosure 3 [MD1000]:  C ( max)

Hmm... something strange going on here. I wonder why this happens, in
the SNMP output you attached previously the values are there. Anyway,
I've added some extra checking in the code to make it report better if
the reading is unavailable for some reason. It should now report simply:

  Temperature Probe 0 in enclosure 2:0:0 [MD1000] is Ready

if the temp reading is not an integer and OMSA reports the status as OK.

> CRITICAL |  1:1:0:1 | EMM 1 in enclosure 3 [MD1000] needs attention: Not 
> Installed

Ah.. I misread the SNMP output.. The status is "Unknown" when reported
by omreport, but "Other" when reported with SNMP. One little annoying
difference between the two.. The output should be:

  EMM 0 in enclosure 2:0:0 [MD1000] is Not Installed

with an OK state.

I've created a second test version:

  http://folk.uio.no/trondham/software/beta/check_openmanage

Please give this one a try and see if it performs better.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Trond Hasle Amundsen
Max Williams  writes:

> Here is the output, the inactive temperature probe is sorted but the
> missing EMM still produces an alert:
>
>   OK |  1:1:0:1 | Temperature Probe 1 in enclosure 3 [MD1000] is Inactive

This one works as expected :)

>   OK |  1:1:0:2 | Temperature Probe 2 in enclosure 3 [MD1000]:  C ( max)
>   OK |  1:1:0:3 | Temperature Probe 3 in enclosure 3 [MD1000]:  C ( max)

Hmm... something strange going on here. I wonder why this happens, in
the SNMP output you attached previously the values are there. Anyway,
I've added some extra checking in the code to make it report better if
the reading is unavailable for some reason. It should now report simply:

  Temperature Probe 0 in enclosure 2:0:0 [MD1000] is Ready

if the temp reading is not an integer and OMSA reports the status as OK.

> CRITICAL |  1:1:0:1 | EMM 1 in enclosure 3 [MD1000] needs attention: Not 
> Installed

Ah.. I misread the SNMP output.. The status is "Unknown" when reported
by omreport, but "Other" when reported with SNMP. One little annoying
difference between the two.. The output should be:

  EMM 0 in enclosure 2:0:0 [MD1000] is Not Installed

with an OK state.

I've created a second test version:

  http://folk.uio.no/trondham/software/beta/check_openmanage

Please give this one a try and see if it performs better.

Cheers,
-- 
Trond H. Amundsen 
Center for Information Technology Services, University of Oslo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] wiki down?

2010-06-28 Thread Max
On Mon, Jun 28, 2010 at 8:29 AM, Matt Simmons
 wrote:
> If only there were some kind of software available to let us know when
> websites were down...

Or people to respond to alerts from the software :)

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] wiki down?

2010-06-28 Thread Matt Simmons
If only there were some kind of software available to let us know when
websites were down...

On Sat, Jun 26, 2010 at 1:07 PM, Roy Sigurd Karlsbakk  
wrote:
>> The Ip of the server points to a Us located server .,
>>
>> they may have not woken up yet , or they are having a HW issue .
>
> Well, it's still down.
>
> Vennlige hilsener / Best regards
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> r...@karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
> er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
> idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
> relevante synonymer på norsk.
>
> --
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null



-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] how to fix excessive latency

2010-06-28 Thread wwanghongrui
Hi,guys~

Our nagios server envrionment: Nagios3.2.0 + Suse10-sp2 x86_64 + 8 GB mem + 4 x 
( Xeon(R) CPU  E7420  @ 2.13GHz )
We have 500+ active check hosts and 3k+ active check services.  I have adjust 
some perfomance parameters in nagios.cfg, like below:
use_large_installation_tweaks=1
child_processes_fork_twice=0
enable_environment_macros=0
check_result_reaper_frequency=5
max_check_result_reaper_time=30

But, The nagios performance is still bad, like below:

Services Actively Checked:Time FrameServices Checked
<= 1 minute:271 (9.4%)
<= 5 minutes:1749 (60.4%)
<= 15 minutes:2824 (97.4%)
<= 1 hour:2898 (100.0%)
Since program start:  2869 (99.0%)

MetricMin.Max.Average
Check Execution Time:  0.09 sec32.23 sec1.113 sec
Check Latency:1.12 sec212.59 sec116.329 sec
Percent State Change:0.00%23.88%0.05%


Hosts Acrively Checked:Time FrameHosts Checked
<= 1 minute:32 (5.5%)
<= 5 minutes:419 (71.5%)
<= 15 minutes:586 (100.0%)
<= 1 hour:586 (100.0%)
Since program start:  586 (100.0%)

MetricMin.Max.Average
Check Execution Time:  0.08 sec4.29 sec3.035 sec
Check Latency:0.00 sec135.25 sec116.420 sec
Percent State Change:0.00%11.32%0.09%




 How could I find which services check or hosts check cause this seriously 
check latency? 


Regards 

HongRui Wang
mail: wwanghong...@cebbank.com
2010-06-28
--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] check_openmanage: Use of uninitialized value in sprintf at /usr/lib64/nagios/plugins/check_openmanage

2010-06-28 Thread Max Williams
Thanks for the really fast response!

Here is the output, the inactive temperature probe is sorted but the missing 
EMM still produces an alert:

[r...@host1 ~]# ./check_openmanage -v
check_openmanage 3.5.9-beta1
Copyright (C) 2010 Trond H. Amundsen
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Trond H. Amundsen 
[r...@host1 ~]# ./check_openmanage -C password -d -H host2
   System:  PowerEdge 2950
   ServiceTag:  JY5CB4J  OMSA version:unknown
   BIOS/date:   2.5.0 09/12/2008 Plugin version:  3.5.9-beta1
-
   Storage Components
=
  STATE  |ID|  MESSAGE TEXT
-+--+
 WARNING |0 | Controller 0 [PERC 6/i Integrated]: Firmware '6.1.1-0047' 
is out of date
  OK |0 | Controller 0 [PERC 6/i Integrated] is Degraded
 WARNING |1 | Controller 1 [PERC 6/E Adapter]: Firmware '6.1.1-0047' is 
out of date
  OK |1 | Controller 1 [PERC 6/E Adapter] is Degraded
  OK |  0:0:0:0 | Physical Disk 0:0:0 [146GB] on ctrl 0 is Online
  OK |  0:0:0:1 | Physical Disk 0:0:1 [146GB] on ctrl 0 is Online
  OK | 1:0:1:14 | Physical Disk 0:1:14 [1.0TB] on ctrl 1 is Online
  OK | 1:0:1:13 | Physical Disk 0:1:13 [1.0TB] on ctrl 1 is Online
  OK | 1:0:1:12 | Physical Disk 0:1:12 [1.0TB] on ctrl 1 is Online
  OK | 1:0:1:11 | Physical Disk 0:1:11 [1.0TB] on ctrl 1 is Online
  OK | 1:0:1:10 | Physical Disk 0:1:10 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:9 | Physical Disk 0:1:9 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:8 | Physical Disk 0:1:8 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:7 | Physical Disk 0:1:7 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:6 | Physical Disk 0:1:6 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:5 | Physical Disk 0:1:5 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:4 | Physical Disk 0:1:4 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:3 | Physical Disk 0:1:3 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:2 | Physical Disk 0:1:2 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:1 | Physical Disk 0:1:1 [1.0TB] on ctrl 1 is Online
  OK |  1:0:1:0 | Physical Disk 0:1:0 [1.0TB] on ctrl 1 is Online
  OK | 1:0:0:14 | Physical Disk 0:0:14 [1.0TB] on ctrl 1 is Online
  OK | 1:0:0:13 | Physical Disk 0:0:13 [1.0TB] on ctrl 1 is Online
  OK | 1:0:0:12 | Physical Disk 0:0:12 [1.0TB] on ctrl 1 is Online
  OK | 1:0:0:11 | Physical Disk 0:0:11 [1.0TB] on ctrl 1 is Online
  OK | 1:0:0:10 | Physical Disk 0:0:10 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:9 | Physical Disk 0:0:9 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:8 | Physical Disk 0:0:8 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:7 | Physical Disk 0:0:7 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:6 | Physical Disk 0:0:6 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:5 | Physical Disk 0:0:5 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:4 | Physical Disk 0:0:4 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:3 | Physical Disk 0:0:3 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:2 | Physical Disk 0:0:2 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:1 | Physical Disk 0:0:1 [1.0TB] on ctrl 1 is Online
  OK |  1:0:0:0 | Physical Disk 0:0:0 [1.0TB] on ctrl 1 is Online
  OK | 1:1:0:14 | Physical Disk 1:0:14 [2.0TB] on ctrl 1 is Online
  OK | 1:1:0:13 | Physical Disk 1:0:13 [2.0TB] on ctrl 1 is Online
  OK | 1:1:0:12 | Physical Disk 1:0:12 [2.0TB] on ctrl 1 is Online
  OK | 1:1:0:11 | Physical Disk 1:0:11 [2.0TB] on ctrl 1 is Online
  OK | 1:1:0:10 | Physical Disk 1:0:10 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:9 | Physical Disk 1:0:9 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:8 | Physical Disk 1:0:8 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:7 | Physical Disk 1:0:7 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:6 | Physical Disk 1:0:6 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:5 | Physical Disk 1:0:5 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:4 | Physical Disk 1:0:4 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:3 | Physical Disk 1:0:3 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:2 | Physical Disk 1:0:2 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:1 | Physical Disk 1:0:1 [2.0TB] on ctrl 1 is Online
  OK |  1:1:0:0 | Physical Disk 1:0:0 [2.0TB] on ctrl 1 is Online
  OK |  0:0 | Logical drive '/dev/sda' [RAID-1, 136.12 GB] is Ready
  OK |  1:0 | Logical drive '/dev/sdb' [RAID-6, 26068.00 GB] is Ready
  OK |  1:1 | Logical drive '/dev/sdc' [RAID-6, 24212.50 GB] is Ready
  OK |  0:0 | Cache battery 0 in controller 0 is Ready
  OK |  1:0 | Cache battery 0 in controller 1 is Ready
  OK |