[Nagios-users] bug in check_logfiles? existing file is reported as not found

2010-01-04 Thread Mirco Benjamin Drick
Hi list,
Tried unsuccessfully to find a support forum for check_logfiles, maybe someone 
here can help me ;-)

I have check_logfiles running on a win2k3 machine and it checks for a file 
using a date pattern (see config snipped below). This morning nagios gave an 
error that the logfile with todays date (ie 2009-12-30.log) did not exist - but 
it did, this has been working for some weeks without problems until now. 
Exchanging the date pattern with the actual file name gave the same result - 
check_logfiles claims the file did not exist. Trying with any other file (fx 
2009-12-29.log or 2009-12-31.log) in the same location gives no problem.

Any ideas how to debug further on this? Otherwise it looks like the problem 
will disappear tomorrow as my created file with tomorrows datepattern works ok.

Thanks
Mirco


Check_logfiles.cfg:
@searches = (
  {
logfile = 
'C:\STEP2CIFileMover\logs\$CL_DATE_$-$CL_DATE_MM$-$CL_DATE_DD$.log',
criticalpatterns = 'ERROR',
options = 'noprotocol,perfdata,nocase,sticky=28800'
  },
);



Mirco Drick | Systems Administrator

Stibo Systems A/S
MASTERING Data Management

T   +45 89 39 11 11
www.stibosystems.com 

This e-mail is intended for the addressee only and may contain confidential 
information.  If you are not the intended recipient, you must not copy, 
distribute or take any action in reliance on it.  If this email is sent to you 
in error, please notify us immediately by telephone or by e-mail.


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Check_openmanage

2010-01-04 Thread Trond Hasle Amundsen
Jack Lyons jack1...@hotmail.com writes:

 I have some older 2650 that through some message with State=UKNOWN when I use
 check_openmanage
  
 See below for output of check_openmanage -d
  
 Is this a hardware issue that we need to address or is this a system
 configuration issue - no fan probes, no temp probes, no volt probs that could
 be handled via configuration / change of check_openmanage
  
 I have added this to the perl code and it works, but I am having problems
 compiling check_openmanage.pl on windows. (problems installings and using
 PAR::Packer)
  
 in the $ok_errors section
   | No\sfan\sprobes\sfound\son\sthis\ssystem   # No 
 battery
 probes
   | No\stemperature\sprobes\sfound\son\sthis\ssystem   # 
 No
 battery probes
   | No\svoltage\sprobes\sfound\son\sthis\ssystem   # No
 battery probes
  

 A) Could someone give me a compiled version of the check_openmanage.pl
 that has the $ok_errors section in it.

Yes, I could do that for you. But see below first...

 B) Can we modify the --only option to include warning+ to include warning
 messages and above AND ignore Unknown states?

Not sure that I understand what you mean. If used, the --only option
specifies exactly one component to check. For example, '--only cpu'
would make the plugin only check the CPUs. All other components are
ignored. No warnings about e.g. fan probes should then appear.

 C) is there another way to prevent to configure the plugin for nagios from
 alerting on this output.

Yes. You can use the '--check' option to specify that you don't want to
check these things. Example:

  check_openmanage --check fans=0,temp=0,voltage=0

Using the '--check' option as above will prevent check_openmanage to
ever running the commands that are failing.

[...]
 UNKNOWN | Problem running 'omreport chassis fans': Error! No fan probes found
 on this system.
 UNKNOWN | Problem running 'omreport chassis temps': Error! No temperature
 probes found on this system.
 UNKNOWN | Problem running 'omreport chassis volts': Error! No voltage probes
 found on this system.

These are errors from running omreport. They indicate that something is
wrong, either with the hardware or with Openmanage. I would try
reinstalling Openmanage first, which may help. The 2650 is an old model,
but if you still have a valid warranty you should contact Dell support
about this problem. These commands should not fail like this. If all
else fails, use the '--check' option as described above.

Cheers,
-- 
Trond H. Amundsen t.h.amund...@usit.uio.no
Center for Information Technology Services, University of Oslo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Notification Question

2010-01-04 Thread steve f

Hello  Happy New Year,

Is it possible to have Nagios notify me of a service problem once an hour AND 
tell me how many times it alerted during that hour time frame?

For example, if I run a plugin, I don't necessarily want to have a notification 
every time the threshold was met but after 1 hour, send me a notification that 
during that hour time period, the threshold was exceeded 10 times?

I know that via the notification cfg I can set the time frame for sending a 
notification but can I keep a running total of the number of alerts for that 1 
hour timeframe?

Thanks,
Steve
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141665/direct/01/--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev ___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] AUTO: Eliot Picken is out of the office (returning 06/01/2010)

2010-01-04 Thread Eliot . Picken

I am out of the office until 06/01/2010.

I am currently out of the office.  Your email has not been forwarded

For urgent issues, please contact Alex Lawrie on +44 (0) 1224 894 000

Best regards

Eliot





Note: This is an automated response to your message  Re: [Nagios-users]
Check_openmanage sent on 1/4/2010 3:43:01 PM.

This is the only notification you will receive while this person is away.


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Notification Question

2010-01-04 Thread Marcel
yeah, just set your normal_check_interval to 6 (minutes if you don't change
the interval_length), and max_check_attempts to 10, then, after 60 minutes
you would be notified. Or the oposite, maybe setting the check interval to
10 and the number of checks to notify to 6. That way you'll always know that
in the last 6 (or 10) checks in the last 60 minutes, you had a threshold
verification alert, but notifications are only sent after reaching the
max_check_attempts.

HTH.


On Mon, Jan 4, 2010 at 1:52 PM, steve f a31mod...@hotmail.com wrote:

  Hello  Happy New Year,

 Is it possible to have Nagios notify me of a service problem once an hour
 AND tell me how many times it alerted during that hour time frame?

 For example, if I run a plugin, I don't necessarily want to have a
 notification every time the threshold was met but after 1 hour, send me a
 notification that during that hour time period, the threshold was exceeded
 10 times?

 I know that via the notification cfg I can set the time frame for sending a
 notification but can I keep a running total of the number of alerts for that
 1 hour timeframe?

 Thanks,
 Steve

 --
 Hotmail: Trusted email with powerful SPAM protection. Sign up 
 now.http://clk.atdmt.com/GBL/go/177141665/direct/01/


 --
 This SF.Net email is sponsored by the Verizon Developer Community
 Take advantage of Verizon's best-in-class app development support
 A streamlined, 14 day to market process makes app distribution fast and
 easy
 Join now and get one step closer to millions of Verizon customers
 http://p.sf.net/sfu/verizon-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev ___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Notification Question

2010-01-04 Thread Jim Avery
2010/1/4 steve f a31mod...@hotmail.com:
 Hello  Happy New Year,

 Is it possible to have Nagios notify me of a service problem once an hour
 AND tell me how many times it alerted during that hour time frame?

 For example, if I run a plugin, I don't necessarily want to have a
 notification every time the threshold was met but after 1 hour, send me a
 notification that during that hour time period, the threshold was exceeded
 10 times?

 I know that via the notification cfg I can set the time frame for sending a
 notification but can I keep a running total of the number of alerts for that
 1 hour timeframe?

Out of the box, no I don't think there is a way you can do that.

If you use ndoutils, I guess you could write a custom notification
command script which gets the information you need by doing a SQL
query of the database.

 Thanks,

I'm not sure you will want to thank me for this advice!  The NDO
schema can be a right pain.

Cheers,

Jim

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] improving the 300 second resolution nagiosgraph

2010-01-04 Thread Litwin, Matthew
I need some assistance with nagiosgraph, specifically with how it handles RRD 
data.

I am finding that there is a 300 second resolution limitation with nagiosgraph 
and how it uses rrdtool. I see the 300 second resolution clearly in the graphs 
themselves (regardless of how much I zoom) which also correlates to the head 
of the output of 'rrdtool dump' for any of the RRD files nagiosgrah has created:

!-- Round Robin Database Dump --rrd version 0003 /version
step 300 /step !-- Seconds --
lastupdate 1262647468 /lastupdate !-- 2010-01-04 23:24:28 UTC --

ds
name errors /name
type GAUGE /type
minimal_heartbeat 60 /minimal_heartbeat
min NaN /min
max NaN /max

!-- PDP Status --
last_ds 0 /last_ds
value 6.00e+01 /value
unknown_sec 0 /unknown_sec
/ds

The problem with this is that I have monitors the run every 60 seconds and the 
lack of precision is excessively smoothing the graphs to the point of them 
being useless.

My question is two-fold:
1) Where is this step period of 300 seconds specified in nagiosgraph?
2) If I were to globally change the step period in nagiosgraph from 300 
seconds to 60 seconds is there some way that I can keep my existing RRD data or 
would it become corrupted if I tried to change this?

Thanks,
Matthew Litwin
mlit...@stubhub.com
415.222.8475


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] nagios always show zero load and no users logged in

2010-01-04 Thread Lambert Emmanuel
Hi,

I have configured Nagios web interface on a server called ZITA.
I have 2 servers that I want to monitor : wsphotonicsA and wsphotonicsB.
In the web interface, the status of both servers is shown with all
green. If I shutdown one of the servers, this is correctly shown.
However, the load of both servers is always shown as zero and Nagios
never detects the number of logged in users (it always shows zero, with
the exception of 1 user that is sporadically detected). The number of
processes is detected correctly.
Serverload has been constantly 50% or more during the past 2 weeks, but
Nagios doesn't detect it.


Extract from nagios.log :

[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Current
Users;OK;HARD;1;USERS O
K - 0 users currently logged in
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;PING;OK;HARD;1;PING OK
- Packet
loss = 0%, RTA = 0.10 ms
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Root
Partition;OK;HARD;1;DISK O
K - free space: / 34103 MB (71% inode=99%):
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;SSH;OK;HARD;1;SSH OK -
OpenSSH_
4.3 (protocol 2.0)
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Swap
Usage;OK;HARD;1;SWAP OK - 
92% free (875 MB out of 956 MB)
[1260918000] CURRENT SERVICE STATE: wsphotonicsA;Total
Processes;OK;HARD;1;PROCS
OK: 21 processes with STATE = RSZDT
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current
Load;OK;HARD;1;OK - loa
d average: 0.00, 0.00, 0.00
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current
Users;OK;HARD;1;USERS O
K - 0 users currently logged in
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;PING;OK;HARD;1;PING OK
- Packet
loss = 0%, RTA = 0.11 ms
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;Root
Partition;OK;HARD;1;DISK O
K - free space: / 34103 MB (71% inode=99%):
[1260918000] CURRENT SERVICE STATE: wsphotonicsB;SSH;OK;HARD;1;SSH OK -
OpenSSH_
4.3 (protocol 2.0)


Here is the cfg file that I use to configure the servers :

photon...@zita:~$ more /usr/local/nagios/etc/objects/wsphotonics.cfg
define hostgroup {
hostgroup_name calculation_servers
alias CALCULATION SERVERS
members wsphotonicsA, wsphotonicsB
}

define host {
use linux-server
host_name wsphotonicsA
alias wsphotonicsA
address 157.193.172.101
hostgroups calculation_servers
max_check_attempts 5
check_command check-host-alive
contact_groups admins
notification_interval 2
notification_period 24x7
notification_options d,u,r
}

define host {
use linux-server
host_name wsphotonicsB
alias wsphotonicsB
address 157.193.172.188
hostgroups calculation_servers
check_command check-host-alive
max_check_attempts 5
contact_groups admins
notification_interval 2
notification_period 24x7
notification_options d,u,r
}


###
###
#
# SERVICE DEFINITIONS - wsphotonicsA
#
###
###


# Define a service to ping to wsphotonicsA
define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}


# Define a service to check the disk space of the root partition
# on the local machine. Warning if  20% free, critical if
#  10% free space on partition.
define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description Root Partition
check_command check_local_disk!20%!10%!/
}



# Define a service to check the number of currently logged in
# users on the local machine. Warning if  20 users, critical
# if  50 users.
define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description Current Users
check_command check_local_users!20!50
}


# Define a service to check the number of currently running procs
# on the local machine. Warning if  250 processes, critical if
#  400 users.
define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}



# Define a service to check the load on the local machine. 

define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.
0
}



# Define a service to check the swap usage the local machine. 
# Critical if less than 10% of swap is free, warning if less than 20% is
free
define service{
use local-service ; Name of service 
template to use
host_name wsphotonicsA
service_description Swap Usage
check_command check_local_swap!20!10
}



# Define a service to check SSH on the local