[Nagios-users] bug in check_logfiles? existing file is reported as not found
Hi list, Tried unsuccessfully to find a support forum for check_logfiles, maybe someone here can help me ;-) I have check_logfiles running on a win2k3 machine and it checks for a file using a date pattern (see config snipped below). This morning nagios gave an error that the logfile with todays date (ie 2009-12-30.log) did not exist - but it did, this has been working for some weeks without problems until now. Exchanging the date pattern with the actual file name gave the same result - check_logfiles claims the file did not exist. Trying with any other file (fx 2009-12-29.log or 2009-12-31.log) in the same location gives no problem. Any ideas how to debug further on this? Otherwise it looks like the problem will disappear tomorrow as my created file with tomorrows datepattern works ok. Thanks Mirco Check_logfiles.cfg: @searches = ( { logfile = 'C:\STEP2CIFileMover\logs\$CL_DATE_$-$CL_DATE_MM$-$CL_DATE_DD$.log', criticalpatterns = 'ERROR', options = 'noprotocol,perfdata,nocase,sticky=28800' }, ); Mirco Drick | Systems Administrator Stibo Systems A/S MASTERING Data Management T +45 89 39 11 11 www.stibosystems.com This e-mail is intended for the addressee only and may contain confidential information. If you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If this email is sent to you in error, please notify us immediately by telephone or by e-mail. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Check_openmanage
Jack Lyons jack1...@hotmail.com writes: I have some older 2650 that through some message with State=UKNOWN when I use check_openmanage See below for output of check_openmanage -d Is this a hardware issue that we need to address or is this a system configuration issue - no fan probes, no temp probes, no volt probs that could be handled via configuration / change of check_openmanage I have added this to the perl code and it works, but I am having problems compiling check_openmanage.pl on windows. (problems installings and using PAR::Packer) in the $ok_errors section | No\sfan\sprobes\sfound\son\sthis\ssystem # No battery probes | No\stemperature\sprobes\sfound\son\sthis\ssystem # No battery probes | No\svoltage\sprobes\sfound\son\sthis\ssystem # No battery probes A) Could someone give me a compiled version of the check_openmanage.pl that has the $ok_errors section in it. Yes, I could do that for you. But see below first... B) Can we modify the --only option to include warning+ to include warning messages and above AND ignore Unknown states? Not sure that I understand what you mean. If used, the --only option specifies exactly one component to check. For example, '--only cpu' would make the plugin only check the CPUs. All other components are ignored. No warnings about e.g. fan probes should then appear. C) is there another way to prevent to configure the plugin for nagios from alerting on this output. Yes. You can use the '--check' option to specify that you don't want to check these things. Example: check_openmanage --check fans=0,temp=0,voltage=0 Using the '--check' option as above will prevent check_openmanage to ever running the commands that are failing. [...] UNKNOWN | Problem running 'omreport chassis fans': Error! No fan probes found on this system. UNKNOWN | Problem running 'omreport chassis temps': Error! No temperature probes found on this system. UNKNOWN | Problem running 'omreport chassis volts': Error! No voltage probes found on this system. These are errors from running omreport. They indicate that something is wrong, either with the hardware or with Openmanage. I would try reinstalling Openmanage first, which may help. The 2650 is an old model, but if you still have a valid warranty you should contact Dell support about this problem. These commands should not fail like this. If all else fails, use the '--check' option as described above. Cheers, -- Trond H. Amundsen t.h.amund...@usit.uio.no Center for Information Technology Services, University of Oslo -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Notification Question
Hello Happy New Year, Is it possible to have Nagios notify me of a service problem once an hour AND tell me how many times it alerted during that hour time frame? For example, if I run a plugin, I don't necessarily want to have a notification every time the threshold was met but after 1 hour, send me a notification that during that hour time period, the threshold was exceeded 10 times? I know that via the notification cfg I can set the time frame for sending a notification but can I keep a running total of the number of alerts for that 1 hour timeframe? Thanks, Steve _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/177141665/direct/01/-- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] AUTO: Eliot Picken is out of the office (returning 06/01/2010)
I am out of the office until 06/01/2010. I am currently out of the office. Your email has not been forwarded For urgent issues, please contact Alex Lawrie on +44 (0) 1224 894 000 Best regards Eliot Note: This is an automated response to your message Re: [Nagios-users] Check_openmanage sent on 1/4/2010 3:43:01 PM. This is the only notification you will receive while this person is away. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification Question
yeah, just set your normal_check_interval to 6 (minutes if you don't change the interval_length), and max_check_attempts to 10, then, after 60 minutes you would be notified. Or the oposite, maybe setting the check interval to 10 and the number of checks to notify to 6. That way you'll always know that in the last 6 (or 10) checks in the last 60 minutes, you had a threshold verification alert, but notifications are only sent after reaching the max_check_attempts. HTH. On Mon, Jan 4, 2010 at 1:52 PM, steve f a31mod...@hotmail.com wrote: Hello Happy New Year, Is it possible to have Nagios notify me of a service problem once an hour AND tell me how many times it alerted during that hour time frame? For example, if I run a plugin, I don't necessarily want to have a notification every time the threshold was met but after 1 hour, send me a notification that during that hour time period, the threshold was exceeded 10 times? I know that via the notification cfg I can set the time frame for sending a notification but can I keep a running total of the number of alerts for that 1 hour timeframe? Thanks, Steve -- Hotmail: Trusted email with powerful SPAM protection. Sign up now.http://clk.atdmt.com/GBL/go/177141665/direct/01/ -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification Question
2010/1/4 steve f a31mod...@hotmail.com: Hello Happy New Year, Is it possible to have Nagios notify me of a service problem once an hour AND tell me how many times it alerted during that hour time frame? For example, if I run a plugin, I don't necessarily want to have a notification every time the threshold was met but after 1 hour, send me a notification that during that hour time period, the threshold was exceeded 10 times? I know that via the notification cfg I can set the time frame for sending a notification but can I keep a running total of the number of alerts for that 1 hour timeframe? Out of the box, no I don't think there is a way you can do that. If you use ndoutils, I guess you could write a custom notification command script which gets the information you need by doing a SQL query of the database. Thanks, I'm not sure you will want to thank me for this advice! The NDO schema can be a right pain. Cheers, Jim -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] improving the 300 second resolution nagiosgraph
I need some assistance with nagiosgraph, specifically with how it handles RRD data. I am finding that there is a 300 second resolution limitation with nagiosgraph and how it uses rrdtool. I see the 300 second resolution clearly in the graphs themselves (regardless of how much I zoom) which also correlates to the head of the output of 'rrdtool dump' for any of the RRD files nagiosgrah has created: !-- Round Robin Database Dump --rrd version 0003 /version step 300 /step !-- Seconds -- lastupdate 1262647468 /lastupdate !-- 2010-01-04 23:24:28 UTC -- ds name errors /name type GAUGE /type minimal_heartbeat 60 /minimal_heartbeat min NaN /min max NaN /max !-- PDP Status -- last_ds 0 /last_ds value 6.00e+01 /value unknown_sec 0 /unknown_sec /ds The problem with this is that I have monitors the run every 60 seconds and the lack of precision is excessively smoothing the graphs to the point of them being useless. My question is two-fold: 1) Where is this step period of 300 seconds specified in nagiosgraph? 2) If I were to globally change the step period in nagiosgraph from 300 seconds to 60 seconds is there some way that I can keep my existing RRD data or would it become corrupted if I tried to change this? Thanks, Matthew Litwin mlit...@stubhub.com 415.222.8475 -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] nagios always show zero load and no users logged in
Hi, I have configured Nagios web interface on a server called ZITA. I have 2 servers that I want to monitor : wsphotonicsA and wsphotonicsB. In the web interface, the status of both servers is shown with all green. If I shutdown one of the servers, this is correctly shown. However, the load of both servers is always shown as zero and Nagios never detects the number of logged in users (it always shows zero, with the exception of 1 user that is sporadically detected). The number of processes is detected correctly. Serverload has been constantly 50% or more during the past 2 weeks, but Nagios doesn't detect it. Extract from nagios.log : [1260918000] CURRENT SERVICE STATE: wsphotonicsA;Current Users;OK;HARD;1;USERS O K - 0 users currently logged in [1260918000] CURRENT SERVICE STATE: wsphotonicsA;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.10 ms [1260918000] CURRENT SERVICE STATE: wsphotonicsA;Root Partition;OK;HARD;1;DISK O K - free space: / 34103 MB (71% inode=99%): [1260918000] CURRENT SERVICE STATE: wsphotonicsA;SSH;OK;HARD;1;SSH OK - OpenSSH_ 4.3 (protocol 2.0) [1260918000] CURRENT SERVICE STATE: wsphotonicsA;Swap Usage;OK;HARD;1;SWAP OK - 92% free (875 MB out of 956 MB) [1260918000] CURRENT SERVICE STATE: wsphotonicsA;Total Processes;OK;HARD;1;PROCS OK: 21 processes with STATE = RSZDT [1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current Load;OK;HARD;1;OK - loa d average: 0.00, 0.00, 0.00 [1260918000] CURRENT SERVICE STATE: wsphotonicsB;Current Users;OK;HARD;1;USERS O K - 0 users currently logged in [1260918000] CURRENT SERVICE STATE: wsphotonicsB;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.11 ms [1260918000] CURRENT SERVICE STATE: wsphotonicsB;Root Partition;OK;HARD;1;DISK O K - free space: / 34103 MB (71% inode=99%): [1260918000] CURRENT SERVICE STATE: wsphotonicsB;SSH;OK;HARD;1;SSH OK - OpenSSH_ 4.3 (protocol 2.0) Here is the cfg file that I use to configure the servers : photon...@zita:~$ more /usr/local/nagios/etc/objects/wsphotonics.cfg define hostgroup { hostgroup_name calculation_servers alias CALCULATION SERVERS members wsphotonicsA, wsphotonicsB } define host { use linux-server host_name wsphotonicsA alias wsphotonicsA address 157.193.172.101 hostgroups calculation_servers max_check_attempts 5 check_command check-host-alive contact_groups admins notification_interval 2 notification_period 24x7 notification_options d,u,r } define host { use linux-server host_name wsphotonicsB alias wsphotonicsB address 157.193.172.188 hostgroups calculation_servers check_command check-host-alive max_check_attempts 5 contact_groups admins notification_interval 2 notification_period 24x7 notification_options d,u,r } ### ### # # SERVICE DEFINITIONS - wsphotonicsA # ### ### # Define a service to ping to wsphotonicsA define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description PING check_command check_ping!100.0,20%!500.0,60% } # Define a service to check the disk space of the root partition # on the local machine. Warning if 20% free, critical if # 10% free space on partition. define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description Root Partition check_command check_local_disk!20%!10%!/ } # Define a service to check the number of currently logged in # users on the local machine. Warning if 20 users, critical # if 50 users. define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description Current Users check_command check_local_users!20!50 } # Define a service to check the number of currently running procs # on the local machine. Warning if 250 processes, critical if # 400 users. define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description Total Processes check_command check_local_procs!250!400!RSZDT } # Define a service to check the load on the local machine. define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4. 0 } # Define a service to check the swap usage the local machine. # Critical if less than 10% of swap is free, warning if less than 20% is free define service{ use local-service ; Name of service template to use host_name wsphotonicsA service_description Swap Usage check_command check_local_swap!20!10 } # Define a service to check SSH on the local