[Nagios-users] Availability Report
Hi All, I need to generate availability for a service in a particular timeperiod. I have created one timeperiod in nagios from 06:00 to 22:00 every day. While creating availability report in report time period i am selecting that timeperiod but the report always generating from 00:00 to 24:00. Can anyone please help me on this. Rgds, Aravind M D -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_snmp_load.pl best linux practices
On my installation I added code to the SNMP load check to count the CPU cores via SNMP and set WARN to 1.25*cores and CRIT to 1.5*cores (for any/all load values). Seems to be working ok. Haven't had any complaints from the NOC for excessive alerting. -f On Wed, 9 Mar 2011, Robert Eden wrote: > Date: Wed, 09 Mar 2011 14:33:13 -0600 > From: Robert Eden > Reply-To: Nagios Users List > To: nagios-users@lists.sourceforge.net > Subject: [Nagios-users] check_snmp_load.pl best linux practices > > I'm currently experimenting with using check_snmp_load.pl to alarm on system > overload. > > Monitoring CPU usage is giving me a lot of false alarms due to their > instantaneous nature. > > I'm getting good results by using the NETSL option to report load averages. > I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm > on 5 and 15 minutes. > > Unfortunately, unlike the CPU percentages, the load numbers should be based > on the number of processors. The NETSL option doesn't do that. > > One option is to have a series of service commands based on the number of > processors, but I'm considering writing a new mode that will using the > "STAND" option to get the number of CPUs and then use that as a > multiplication factor for alarms. > > Does that make sense? Surely others have run into this problem. How do you > alarm on excessive load w/o causing lots of false alarms. > > Robert > > > > > > -- > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_openmanage internal error
Output from omreport storage connector controller=1 -fmt ssv --- List of Connector(s) on Controller PERC 6/E Adapter (Slot 2) ID;Status;Name;State;Connector Type;Termination;SCSI Rate 0;Ok;Logical Connector ;Ready;SAS Port RAID Mode;Not Applicable;Not Applicable Path Health Status;Name;State;Status;Name;State Ok;Connector 0 ;Available Ok;Connector 1 ;Available --- I just tried checking this via SNMP and it appears to work just fine. I don't see any errors and the formatting of the -d output looks normal. Of course, it doesn't report any information regarding path health. -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Problem: Nagios service check retry interval shorted than configured.
I have nagios core 3.2.3 built on SuSE 11.1 and I've been noticing apparent problem with service check retries. The normal check interval is set to 7.5 and the retry interval is set to 1 minute. I'm seeing entries like this in the log: [03-02-2011 16:44:39] SERVICE ALERT: aps11;Extra_01.20;OK;SOFT;2;SELRC OK [03-02-2011 16:44:29] SERVICE ALERT: aps11;Extra_01.20;UNKNOWN;SOFT;1;SELRC UNKNOWN - Timeout (130 sec.) reached [03-02-2011 13:28:19] SERVICE ALERT: aps14;Extra_04.15;OK;SOFT;2;SELRC OK [03-02-2011 13:28:09] SERVICE ALERT: aps14;Extra_04.15;CRITICAL;SOFT;1;SELRC CRITICAL Why is there only 10 seconds between these pairs of checks? Sometimes I see a 20 or 30 second difference sometimes 60 seconds. Most of them are less than 30 seconds. It's very inconsistent. Any idea what could be causing this? Thanks, Paul Dubuc -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] check_snmp_load.pl best linux practices
I'm currently experimenting with using check_snmp_load.pl to alarm on system overload. Monitoring CPU usage is giving me a lot of false alarms due to their instantaneous nature. I'm getting good results by using the NETSL option to report load averages. I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm on 5 and 15 minutes. Unfortunately, unlike the CPU percentages, the load numbers should be based on the number of processors. The NETSL option doesn't do that. One option is to have a series of service commands based on the number of processors, but I'm considering writing a new mode that will using the "STAND" option to get the number of CPUs and then use that as a multiplication factor for alarms. Does that make sense? Surely others have run into this problem. How do you alarm on excessive load w/o causing lots of false alarms. Robert -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Acknowledgements with hard coded email address
I want to set Nagios so that no when you acknowledge either a host or service problem with the "Notify" checkbox checked, the email goes to a specific email address. Anyone know if this is possible? I've dug through the code a little bit, but I'm not a programmer and I can't find any available configuration options. Robert -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] service notification logged but not done
Have you recently upgraded Nagios? When did you start noticing that it was missing execution runs? Do you have enough disk space free? What are the permissions of the script set to? Were they recently changed? Have you done any type of software changes with any type of supporting packages (i.e. Perl) that could have brought up this issue? Here are some thoughts on where I would start looking. Anything that you can dig up we can look at more closely to identify a potential cause for this issue. ~Chad On Wed, Mar 9, 2011 at 1:29 AM, MAYER Hans wrote: > > > > > Dear all > > > > Using Nagios since a lot of years, I was starting with one of the first > versions of “netsaint”, and more than 25 years of experience with UNIX, I > have now a strange problem I never had before. > > I am running Nagios Core 3.2.3 on Solaris 10 OS. Hardware is M3000 with > SPARC V9 architecture. > > > > My problem is, I see sometimes – not always – a service notification in the > log, but it is not really done. > > > > Here an example, the entry in the log > > > > [03-09-2011 09:13:25] SERVICE NOTIFICATION: > sms_mayer;amazon;DISK/p14amazon;OK;notify-service-by-sms;DISK OK - free > space: /p14amazon 4531 MB (6% inode=99%): > > > > Here is the definition for notify-service-by-sms > > > > # 'notify-service-by-sms' command definition > > define command{ > > command_namenotify-service-by-sms > > command_line$USER1$/rshsendsms $CONTACTPAGER$ \"Info: > $HOSTALIAS$/$SERVICEDESC$ $SERVICEOUTPUT$ \" > > } > > > > > > As you see I execute a command named “rshsendsms”. And this are the first > lines of the shell script: > > > > : > > > > # Wed Jan 19 10:12:15 MET 2011 - mayer initial > > # Wed Feb 16 10:11:54 MET 2011 - mayer logging the UID > > > > # usage: > > # rshsendsms 0043664xxx '"hello world - how are you "' > > # info: both types of apostrophes are important > > > > export PATH LOG NUMBER TEXT ID UID NOTSENT RUNLOG > > > > PATH=/usr/bin:$PATH > > > > LOG=/var/adm/rshsendsms.log > > RUNLOG=/var/adm/rshsendsms_run.log > > > > date '+%y%m%d %H:%M' >> $RUNLOG > > > > The first action I do, I write a log-entry. (91% of the disk is free) But > in this case I cannot find the entry. The last one is dated with 110309 > 06:39, where I received a SMS really. I also switched on the process > accounting weeks ago. But there is no entry to be found, that the shell > script was executed. > > I also switched on the debug facility of “syslog”. I can find an equivalent > entry like in the Nagios log. But there are no other messages, that > something could be wrong. > > But on other hand I was informed at 06:39 and nothing was changed in the > meantime. This is not the first time this problem happens. Most of the time > notification works fine, but sometimes not. This is of course a pain as > notification is one central functionality of Nagios. > > > > Any idea where I can start searching for the error ? > > > > Kind regards > > Hans > > > > > > > > > -- > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_openmanage internal error
Adam Caines writes: > Looks like it's reporting "path health". The 6e has both sas ports > connected to redundant controllers in the MD1120. It's strange on > another server, I also have a PERC H700 connect to a MD1220 with > redundant links and it does not output the "path health" section. [snip] > ID : 0 > Status : Ok > Name : Logical Connector > State : Ready > Connector Type : SAS Port RAID Mode > Termination : Not Applicable > SCSI Rate : Not Applicable > > Path Health > Status : Ok > Name : Connector 0 > State : Available > > Status : Ok > Name : Connector 1 > State : Available Yes, so this is the culprit... check_openmanage did not expect this output. It looks like the controller is connected to the enclosure in redundant path mode, according to the OMSA documentation[1]. I really need to see how this looks with SSV format, can you provide the output from this command: omreport storage connector controller=1 -fmt ssv In case of redundant path mode, the plugin should check the path health and report on it, in addition to the connector health. This functionality must be added to the plugin. Is it possible for you to check how check_openmanage handles this when checking via SNMP as well? [1] http://support.euro.dell.com/support/edocs/software/svradmin/6.4/en/CLI/HTML/reportst.htm#wp1077100 Cheers, -- Trond H. Amundsen Center for Information Technology Services, University of Oslo -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Monitoring CA ArcServe
Good morning everyone, Anybody ever used this script: http://exchange.nagios.org/directory/Plugins/Backup-and-Recovery/ArcServe/CA-ARCserve-Backup-r12-Number-of-Job-Error-Check/details To monitor the CA ArcServe? I'm having several problems, NRPE_NT and NSClient + + can not return the server response to the problem is that Nagios in the call: function CMD (ByRef cmdline) September oShell = WScript.CreateObject ("WScript.Shell") September oExec oShell.Exec = (cmdline) oExec.StdOut.Readall ret = () September oExec = nothing: Set oShell = nothing CMD = ret End Function I have trouble making the call to "Shell" in Rwindows Server 2008 x64. The error I always get the server is: CHECK_NRPE: Socket timeout after 10 seconds. I've changed my command_timeout values and using the parameter -t (to change timeout) on check_nrpe (Nagios server), but always ends with the Socket timeout when I do that blessed call Shell. Does anyone have any suggestions? Thanks OFF: I was seeing things as monitor Nagios with Python, how to do this is to monitor RWindows? -- Rafael Henrique da Silva Correia http://abraseucodigo.blogspot.com Administrador de Sistemas Linux Certificado pela LPIC - 101 ID: LPI000160699 -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] service notification logged but not done
Dear all Using Nagios since a lot of years, I was starting with one of the first versions of "netsaint", and more than 25 years of experience with UNIX, I have now a strange problem I never had before. I am running Nagios Core 3.2.3 on Solaris 10 OS. Hardware is M3000 with SPARC V9 architecture. My problem is, I see sometimes - not always - a service notification in the log, but it is not really done. Here an example, the entry in the log [03-09-2011 09:13:25] SERVICE NOTIFICATION: sms_mayer;amazon;DISK/p14amazon;OK;notify-service-by-sms;DISK OK - free space: /p14amazon 4531 MB (6% inode=99%): Here is the definition for notify-service-by-sms # 'notify-service-by-sms' command definition define command{ command_namenotify-service-by-sms command_line$USER1$/rshsendsms $CONTACTPAGER$ \"Info: $HOSTALIAS$/$SERVICEDESC$ $SERVICEOUTPUT$ \" } As you see I execute a command named "rshsendsms". And this are the first lines of the shell script: : # Wed Jan 19 10:12:15 MET 2011 - mayer initial # Wed Feb 16 10:11:54 MET 2011 - mayer logging the UID # usage: # rshsendsms 0043664xxx '"hello world - how are you "' # info: both types of apostrophes are important export PATH LOG NUMBER TEXT ID UID NOTSENT RUNLOG PATH=/usr/bin:$PATH LOG=/var/adm/rshsendsms.log RUNLOG=/var/adm/rshsendsms_run.log date '+%y%m%d %H:%M' >> $RUNLOG The first action I do, I write a log-entry. (91% of the disk is free) But in this case I cannot find the entry. The last one is dated with 110309 06:39, where I received a SMS really. I also switched on the process accounting weeks ago. But there is no entry to be found, that the shell script was executed. I also switched on the debug facility of "syslog". I can find an equivalent entry like in the Nagios log. But there are no other messages, that something could be wrong. But on other hand I was informed at 06:39 and nothing was changed in the meantime. This is not the first time this problem happens. Most of the time notification works fine, but sometimes not. This is of course a pain as notification is one central functionality of Nagios. Any idea where I can start searching for the error ? Kind regards Hans -- Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null