Re: [Nagios-users] Any numbers on sizing a nagios server?
We're using a similar hardware config to the one Jake mentioned below and nagios 2.10. We push nagios slave servers to anywhere between 6000 and 7000 services per slave (though service latency starts to climb quickly around 7000.) Most services are checked every 5 minutes. The master server handles all host checks. We've been able to scale this config up to about 7800 hosts and 5 services in a single datacenter. This includes 1 nagios master and 8 nagios slaves. Your mileage will vary and as Jake mentioned, your environment may have a large impact on expected performance. -Aaron Devey Paulus, Jake wrote: > Nagios performance is very much specific to your environment. Nagios 3.x > is also MUCH faster than Nagios 2.x because of parallel host checks (and > other features.) Our performance is summed up below but your millage my > vary: > > Primary server: > Nagios 3.0.3 > Dual, quad-core processors @ 2.4 GHz and 4GB of RAM > ~650 hosts, 1550 services > System load averages 0.8, 0.6, 0.5 (CPUs are mostly idle with spikes > when hundreds of checks get kicked off at once) > > Average service check latency 0.3 seconds > Average host check latency 2.5 seconds > > All of our service checks are active, mostly snmpget and snmpbulkget and > lots of pings - largely checking every 2-5 minutes. Most of our service > checks are bash and perl scripts (we don't use the embedded Perl > interpreter.) We also collect and parse perfdata for graphing and run > Cacti and other very small MySQL-driven webapps on this same server. The > server is definitely a little over-kill but the price was right and it > was purchased with Nagios 2.x in mind - once again, Nagios 3.x is much > faster. Our environment is also not "tuned" for performance other than > to put in sane timeouts for service checks so they don't sit around > waiting too long. > > > Thanks, -Jake > > > -Original Message- > From: Edgar Matzinger [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 04, 2008 1:56 PM > To: Nagios Mailinglist > Subject: [Nagios-users] Any numbers on sizing a nagios server? > > LS, > > I've searched the internet (maybe I look in the wrong places) but I > can't find any numbers on sizing a nagios server. Are there any numbers > out there amongst you and are you willing to share? > > Thanks, regards, Edgar. > -- > |\ /| :: Addr: Valid Eindhoven B.V. > / | \/ | : Edgar R. Matzinger : t.a.v. E.R. > Matzinger > / || :: Paradijslaan 36 > \ /| /\| :: 5611 KN Eindhoven > \/ / \ : Valid Eindhoven BV : > \ /\ / :: >\/ |\/ :: > |:: > Disclaimer: Any comments, opinions made are mine, etc ... > > > > - > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > <http://moblin-contest.org/redirect.php?banner_id=100&url=/> > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > - > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > <http://moblin-contest.org/redirect.php?banner_id=100&url=/> > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coole
Re: [Nagios-users] Event handler
Drew Weaver wrote: > $a = $argv[0]; > $b = $argv[1]; > $c = $argv[2]; > $d = $argv[3]; > $handle = fopen(“output”, “a+”); > $content = “$a - $b - $c - $d\n”; > $go = fwrite($handle, “$content”); > ?> You'll want to specify the full path to the 'output' file. Nagios won't necessarily call it from the same working directory that you used from the shell. -Aaron - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] NRPE vs NSCA benchmarking
Maurizio Pinotti wrote: > NSCA PROS/CONS: the opposite It's important to note that multiple NSCA results can be sent per connection. This makes it slightly more load/network friendly when you have a lot of services. However, taking advantage of this benefit will increase the complexity of your check submission script. -Aaron - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Hosts reboots too fast for check_alive notification
You could have the server fire a script during reboots that submits a check result to nagios via NSCA. It might be a little more elaborate than what you were looking for, but it will always catch a reboot even when a host check misses it. -Aaron Rodrick Brown wrote: > When one of my hosts reboots I’m never notified about the outage. > Currently I’m using a custom script S99bootnotify to alert me when a > host comes online, is there any way to shorten the polling for > check_alive? I find it strange that a host could reboot and nagios not > detect that outage. > > > > Thanks. > > > > --- > > Rodrick R. Brown > > Director, Systems Engineering > > Ballista Securities, LLC > > 120 Wall St. Suite 2400 > > P: 646 307 4709 > > C: 347 702 0012 > > F: 646 219-5872 > > E: rbrown(at)ballistasec.com > > > > > > > - > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > > > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Host checks under Nagios 1.x
I had a similar problem to this. I only wanted to know if a not-so-important device had been down for an hour or more. Here's what I ended up doing: I disabled the host check (by having it call an "always-ok" checkcommand that always returns 0.) I then added a 'PING' service to the host with a max_check_attempts of 7, and a retry_check_interval of 10 minutes. The pitfall being that I no longer receive 'HOST DOWN' alerts for that host. I instead receive alerts for a failing 'PING' service. -Aaron Andrew Cruse wrote: > I've got an interesting problem with a particular setup. I'm monitoring a > number of servers that the main Nagios installation doesn't have direct > network access to, so I pass all of the host and service checks through an > NRPE installation that can communicate with both Nagios and the servers > being monitored. A little tweaking with check timeouts and whatnot and this > setup works pretty nicely. I've run into a problem where for some reason, > the NRPE server periodically stops responding to NRPE requests. Haven't > gotten to the bottom of that (Connection refused) yet. Service checks are > able to handle the problem fine as the duration of the NRPE outage is much > shorter than the time it takes for the services to go into a hard critical > state. The problem is, once the first service check goes through and goes > into a soft critical state, that triggers the host checks which also fail > (host checks go through NRPE as well) and immediately generate a > notification. I'd like to find a way to make the host checks a little more > forgiving as well. > > A few things I've thought of or tried: > > 1. I tried bumping up the host check retries to 30, but since the checks > immediately fail with "connection refused" it runs through all 30 tries > within just a few seconds. I also worry about this leading to unneeded load > on the Nagios server since this is generally going to cause check_nrpe to be > run 30 times, for each of the ~20 servers in this setup. > > 2. Extending the timeout on the check_nrpe commands doesn't help because > "connection refused" is returned immediately. > > 3. Switching to a passive setup is probably the way to go, but for now am > trying to avoid all the reconfiguration needed to move in that direction. > > > Ideally what I'd like to be able to do is have the host checks retry on a > particular interval (i.e. once per second) rather than instantly after the > previous executed. Is there a way to do this? > > Incidentally, while typing up this email I was actually able to find the > root problem with the NRPE setup. NRPE was being called via Xinetd which > wasn't configured to allow enough simultaneous connections for a single > service. Thus when it started getting hammered with NRPE requests as a > result of the host check configuration it would stop allowing NRPE > connections for 30 seconds. A quick change to the Xinetd config file seems > to have solved the problem. > > I'm still interested to know how anyone handles the situation where a host > may be unresponsive to host checks for a period of time yet you only wish to > fire off a notification after a specific period of time. Would a wrapper > around the host check be the only way to handle it? > > Andrew > > > - > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] how to get the current temp in a warning messagesent
Instead of $DATETIME try $SHORTDATETIME$ -Aaron Randy Paries wrote: > > Andreas > changing $OUTPUT$ to $SERVICEOUTPUT$ worked!! > thanks > > currently it is set to Date/Time: $DATETIME > and this always is blank > Thanks > > > > On Nov 19, 2007 3:46 PM, Andreas Ericsson <[EMAIL PROTECTED]> wrote: > > > > Randy Paries wrote: > > > Hello, > > > I have the following service: > > > > > > define service{ > > > usegeneric-service > > > host_name bart > > > service_description Probe #1 Temperature > > > is_volatile0 > > > check_period 24x7 > > > max_check_attempts 2 > > > normal_check_interval 5 > > > retry_check_interval 1 > > > contact_groupsall_admins > > > notification_interval 120 > > > notification_period 24x7 > > > notification_options w,u,c,r > > > check_command check_temptraxf!/dev/ttyS0!1!74!78 > > > } > > > > > > > > > when i get a warning i get the message below. Is there a way to > > > include the current temp in the warning? > > > thanks for any help > > > > > > > > > * Nagios * > > > > > > Notification Type: PROBLEM > > > > > > Service: Probe #1 Temperature > > > Host: Bart #1 > > > Address: 192.168.0.214 > > > State: WARNING > > > > > > Date/Time: $ > > > > > > Additional Info: > > > > > > $ > > > > > > > > > > You're using the wrong macros. Try changing $OUTPUT$ to > $SERVICEOUTPUT$ in > > your service notification macro. I don't know which one you want for > > date/time, but a glance at the documentation will tell you. > > > > -- > > Andreas Ericsson [EMAIL PROTECTED] > > OP5 AB www.op5.se > > Tel: +46 8-230225 Fax: +46 8-230231 > > > > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notifications
Forgot to include the list in the CC. Aaron Devey wrote: > I don't see any obvious problems with your service definitions. Did you > find out TrendMicro was down for 6 hours by reviewing the nagios logs? > If so, that means nagios at least saw the service had a problem. If you > found out it was down by some other means, perhaps you can check the > nagios logs to make sure nagios saw a "critical" or "warning" problem > with the service. > > Also, If you have log_notifications turned on, try examining the logs of > the timeperiod it was down. If you don't see any attempts to send a > notification for TrendMicro on lg03, then it's likely a configuration > problem somewhere. > > Finding it is the hard part. :) The first places I would check are the > service_notification_period, service_notification_options, and > service_notification_commands for the contacts in the 'mis' group. > Follow the service_notification_commands to make sure the command it > points to is set up correctly as well. If there are no problems there, > I'd make sure there are no service escalations for that service. > > If that doesn't help, I have no idea what the problem could be. :) > > Good luck, > > -Aaron > > > Jerad Riggin wrote: > >> define service{ >> namegeneric-service ; Generic >> service name >> active_checks_enabled 1 ; Active >> service checks are enabled >> passive_checks_enabled 1 ; Passive >> service checks are enabled/accepted >> parallelize_check 1 ; Active >> service checks should be parallelized (Don't disable) >> obsess_over_service 1 ; We should >> obsess over this service (if necessary) >> check_freshness 0 ; Default is >> to NOT check service 'freshness' >> notifications_enabled 1 ; Service >> notifications are enabled >> event_handler_enabled 1 ; Service >> event handler is enabled >> flap_detection_enabled 1 ; Flap >> detection is enabled >> process_perf_data 1 ; Process >> performance data >> retain_status_information 1 ; Retain >> status information across program restarts >> retain_nonstatus_information1 ; Retain >> non-status information across program restarts >> register0 ; DONT >> REGISTER THIS DEFINITION - NOT A REAL SERVICE, JUST A TEMPLATE! >> } >> >> >> define service{ >> use generic-service >> namewindows-service >> is_volatile 0 >> check_period24x7 >> max_check_attempts 5 >> normal_check_interval 3 >> retry_check_interval1 >> notification_interval 15 >> notification_period 24x7 >> register0 >> } >> >> define service{ >> use windows-service >> namecheck-trend >> notification_optionsw,u,c,r >> check_command check_nt!SERVICESTATE!-d >> SHOWALL -l ofcservice >> register0 >> } >> >> define service{ >> use check-trend >> service_description TrendMicro >> contact_groups mis >> # hostgroup_namewindows-clients >> host_name lg03 >> } >> >> >> > > > - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notifications
What are your notification options set to? In 2.9 the default is "none" so if you didn't specify them for that service, it won't alert. If that's not the answer, perhaps you can paste your the definitions for your service, contact, and notification command? -Aaron Jerad Riggin wrote: > > I have a nagios 2.9 install. I have one host with multiple services > being monitored. On the 16th the host didn't respond to a ping (the > server rebooted), and recovered within 3 minutes. I received an > e-mail for both the failure and recovery. I am also monitoring some > windows services on the same box using NsClient++. It shows on the > same day that after it recovered the TrendMicro virus process was down > for 6 hours. I didn't receive an e-mail during this entire time. It > is set at 5 max attempts, 3 normal check, and 1 retry with a > notification interval of 15 minutes. It should have at least notified > once but it didn't. Any ideas? > > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] check_nrpe trouble
Hiamal Llanos wrote: > > But if I run the command on the terminal window it works happily: > $ sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H otherhost -c > check_load > OK - load average: 0.00, 0.00, 0.00|load1=0. > What does your check_nrpe checkcommand look like? You'll want to verify that the syntax matches the syntax you used above, and that nagios is running as the 'nagios' user you specified above. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] how to use servicedependency?
If I am reading your question right, the dependency works, but currently you get alerts for sv1.dummy1 AND sj2.router1, and you only want alerts for sj2.router1. If this is the case, you could try setting up sv1.dummy1 so that it doesn't alert. Unfortunately, you might run into problems with getting sj2.router1 to recognize a recovery if sv1.dummy1 recovers first. You could try a circular dependency (and I'm not even sure if you can do that in nagios) where sj2.router1 only runs if sv1.dummy1 is failing, and sv1.dummy1 only runs if sj2.router1 is passing. But then you might get a problem where neither check runs because sv1.dummy1 is passing, and sj2.router1 is failing. This is a difficult problem to solve with service dependencies. Basically you want to go critical if both AND fail. But recover if either OR pass. Unfortunately, the way your service dependency works, the status of is directly tied to the status of . And never updates if passes. So you really need to determine the status of both checks and alert accordingly, or you need an event handler for to submit an 'OK' status for when it's passing. The first of those two options is definitely the easiest. It simply consists of a small shell script that runs and if fails, returns the status of . Consider a script such as the following: #!/bin/bash CHECK_ONE="/path-to-checks/check_ping -H $1 -t 2 -p 2 -w 500,50% -c 999,99%" CHECK_TWO="/path-to-checks/check_ping -H $2 -t 2 -p 2 -w 500,50% -c 999,99%" if $CHECK_ONE >/dev/null 2>&1; then echo "Check one OK." exit 0 else exec $CHECK_TWO fi Replacing your own check commands in CHECK_ONE and CHECK_TWO of course. The first one would be the equivalent of your "check-link" command. The second would be the equivalent of your "check_nrpe!check_router1" command. Note that in this case I used $1 and $2, so the first argument to the script would be the first host to check, the second argument would be a second hostname. You don't have to use arguments and could just hard-code the values into your script, but it makes the script more scalable if your installation grows. The second check is ONLY executed if the first one fails. This way you only need one host, one service, and no dependencies. If you named your checkcommand "check_double" the service would be something like: define service { use service-template host_name sj2 service_description sj2.router1 check_command check_double!first_hostname!second_hostname } Good luck! -Aaron Jeremy C. Reed wrote: > > (I posed a couple weeks ago, but only got one response which was different > than what I think I want to do.) > > I am running Nagios 2.9. > > I want: if a check_ping fails then I don't want an alert sent to me > unless a second test (check_nrpe to a remote system that does the same > check_ping) fails. > > I am reading http://nagios.sourceforge.net/docs/2_0/dependencies.html > (I was looking at 3_0 last time.) And I am looking at > http://www.linickx.com/blog/archives/271/how-to-monitor-wordpress-with-nagios/ > > Where is execution_failure_criteria and notification_failure_criteria > documented for 2.9? > > Can someone please provide an example of only sending a problem alert if > two different check_commands fail and the second check_command is not done > if the first one is OK? > > This is what I have: > > define service { > use service-template > host_name sj2 > service_description sj2.router1 > check_command check_nrpe!check_router1 > } > > # The "dependent" is the object that needs something. > define servicedependency { > dependent_host_name sj2 > dependent_service_description sj2.router1 > host_name sv1 > service_description sv1.dummy1 > # o = fail on an OK state, the dependent service will not be actively > # checked if the master service is in OK > execution_failure_criteria o > # notification_failure_criteria o > } > > define service { > use service-template >host_name sv1 > service_description sv1.dummy1 >check_command check-link > } > > > But I am getting two alerts if both don't return OK. I only want one > alert. Also I am unsure how to use the execution_failure_criteria and > notification_failure_criteria. > > And I do not want my "sj2.router1" to even be checked if the first > "sv1.dummy1" is successful. But if sv1.dummy1 fails, then I want the > sj2.router1 check to happen. And if it fails then send my alert. > > > > Jeremy C. Reed > > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > :::
Re: [Nagios-users] Regex Services?
I've never used regex in services. Assuming it's possible in this case, a workaround for the comma might be: ^(host|gost)\d\d?\.\w+\.\w+$ -Aaron Kerry Milestone wrote: > When trying to verify with Nagios, it seems to stop reading the string > on the first comma it comes accross. The Nagios documentation is a > little light with how to use regex other than using wildcards. > > Is what I am trying to do, actually possible? > > Cheers. > > > Error: Could not find any host matching '^(host|gost)\d{1' > > > >> kerry, >> >> note >> >> ^(host|gost)\d{1,2}\.\w+\.$ >> >> change it to >> ^(host|gost)\d{1,2}\.\w+\.\w+$ >> >> and test again. >> >> Learner >> >> > > > - > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification on Stalk
The event handler might work but doesn't it stop executing after the service enters a hard state? 0 (At least, that's how I understood nagios 2.x to work, perhaps nagios 3.x differs in this regard.) A clever workaround to this might be to use the performance processing options built into nagios. Perhaps by using the 'service_perfdata_command' and 'process_performance_data' nagios options, and the 'process_perf_data' service directive, you could call a script to process the data, log to a database, send emails, etc. This is not 100% ideal, since nagios should be handling the notifications/emails. Instead of sending the emails, the above said script could also submit a 'critical' passive check to a single volatile service if the status is critical AND has changed... but at this point I think I'm making this more complicated than it needs to be. -Aaron Patrick Morris wrote: > Why not use an eventhandler that parses the plugin output? > > > - > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification on Stalk
For what it's worth, I have been looking for a solution similar to this as well. What I'd really like to see is an "event_stalking_options" parameter in nagios where the event handler is called based on the stalking options. In your case, the easiest (but probably most annoying) solution might be to set the notification interval to the same value as your check interval. If your check is running every 5 minutes, and your notification interval is set to fire off every 5 minutes, then each notification sent out will have the latest check results. -Aaron Petersen, Mark wrote: > I've searched high and low for the answer to this. It seems that > because nagios just checks exit status, its not easy to create a > notification on stalking. I'm wondering if I can definte additional > exit codes as critical (without modifying the source,) or if there is > another soltuion to this. > > For instance, say I'm checking disk space. Warn at 85%, Crit at 90%. I > also want a notification at 95,96,97,98,99,100%. I could easily exit 95 > for 95%, 96 for 96%, etc. I believe this creates an unknown message. > If I exit at 96, since this is a different exit code (but still unknown) > would I get another notification? I know, I can test this, but it seems > clunky and I don't like the unknown status issue for historical > tracking. > > Volatile services with passive checks that only submit on change is > another option, but this presents issues with needing to do freshness > checking and wanting to have active checks as much as possible. > > Are there any other solutions to this problem? I know from a few > archive threads there isn't much demand for this, but it seems like > anytime you turn on stalking this would be a nice option (why wouldn't > you want to be notified as your array degrades as per the example for > stalking.) Looking at the docs I don't see anything in 3.0 that will > help with this either. > > Thanks, > Mark > > > - > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > ___ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Using nagios for reporting on non-machine data such as employees?
Kelly, I admire your determination to use nagios in many versatile ways. Unfortunately, nagios is probably not the best fit in this case. This is especially true if you really intend to use passive checks instead of leveraging nagios' powerful scheduling or notification features. Plus, adding a new employee could be a potential nightmare if many systems are involved. The effort spent in getting nagios services set up and reto-fitted the way you need would be much better spent on a more conventional solution. Your situation is fairly unique, so I am unable to think of any ready-to-go solutions. In the long run it's probably easiest to periodically upload all this data to a database and put some php scripts together so you can view reports on that data over a web interface. -Aaron Devey Kelly Jones wrote: >We have various systems that keep track of employee data: when an >employee was last paid, hours of sick/vacation leave accrued, >employee's laptop's last IP address (from DHCP server), last time >employee's laptop was backed up (from backup server), whether employee >is on-lave/traveling, whether the employee has been receiving email >(vs employee's mailbox being full, account not setup properly, etc), >etc. > >I realized we could use nagios' "passive service checks" to have the >various systems upload employee data to our nagios server, but was >wondering if this was fitting a round peg into a square hole. > >Is nagios a good tool for monitoring things that aren't machines? If >not, what would be a good tool? > >One concern: nagios tends to treat data as almost "binary"-- either >something is good (green) or bad (red) [yes, yellow + "unknown" also >exist, but it's still almost binary]. In some cases, we're just >looking to create an "employee status report" page that has text data >on the employee (pushed from various servers), without necessarily >categorizing the data as "good" or "bad". > > > - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null