Re: [Nagios-users] host object declarations
> The only thing I'm saying in that bugreport is > that Nagios does not and will not complain when the "address" fields of > hosts are unique. In fact you also said: > that suggests that either your namestandard (used for the host_name field) > sucks, or that you or your co-workers are simply confused when it comes > to configuring Nagios. So let's talk best practices. In practice it did occur that when someone copy/pasted a bunch of service definitions (yeah I know you should use templates, call that confusion if you like) just forgot to update the address field and the newly created checks (or so they believed) did nothing more than check the old stuff once more. According to what you suggest, it is wisest to indeed use DNS names where you can in the host_name field. (and only use the address field in special circumstances) (in Nagios4) If you define your nomenclature like that, possible confusion may be eliminated ;-) BTW I have all sorts things in place that - check for duplicate addresses - check for syntax errors all over the place - check if the plugin specified in command_line actually exists thus preventing stupid error 127 or alike If anyone finds that interesting, see below. bye, Marki PS. As a sidenote: I once created a more or less functional version of a script allowing to change service names. It's not trivial but can be done. Does anyone know of something existing and working that allows to change service and also hostnames? for f in $(grep command_line $NAGIOS/etc/checkcommands.cfg | awk '{print $2}' |\ grep USER1 | sed 's/\$USER1\$\///'); do if [ ! -f $NAGIOS/libexec/$f ]; then echo "[KO] $f does not exist" ERR=$(($ERR + 1)) fi done [ "$ERR" -eq 0 ] && echo "[OK] all commands seem to exist" # make sure to always update both statements! if [ "$(grep -r address $NAGIOS/etc/hosts* | grep -v '~' | grep -v svn |\ grep -v '.bak' | cut -d: -f2 | tr -d ' \t' | awk -F'address' '{print $2}' |\ sort | uniq -c | sort -n | grep -v ' 1 ' | wc -l)" -ne 0 ]; then echo "[INFO] IP address defined more than once:" grep -r address $NAGIOS/etc/hosts* | grep -v '~' | grep -v svn |\ grep -v '.bak' | cut -d: -f2 | tr -d ' \t' |\ awk -F'address' '{print $2}' | sort | uniq -c | sort -n | grep -v ' 1 ' else echo "[OK] Dup IP check ok" fi # syntax check - bash scripts OK=1 echo Please wait... for i in $(find $NAGIOS/libexec -type f | grep -v '~'); do echo $i | grep -q '/.svn/' && continue if [[ "$i" =~ '\.sh' ]] || [ "$(head -1 $i)" = '#!/bin/bash' ]; then if ! sh -n $i; then OK=0 echo "[KO] bash syntax error - $i" fi fi if [[ "$i" =~ '\.pl' ]] || [ "$(head -1 $i)" = '#!/usr/bin/perl' ]; then if ! perl -wc $i; then OK=0 echo "[KO] perl syntax error - $i" fi fi if [[ "$i" =~ '\.php' ]]; then if ! php -l $i &>/dev/null; then OK=0 echo "[KO] PHP syntax error - $i" fi fi done [ "$OK" -eq "1" ] && echo "[OK] shell scripts - syntax checks ok" # check command syntax TMP=/tmp/$(basename $0).$$ grep command_line $NAGIOS/etc/misccommands.cfg $NAGIOS/etc/checkcommands.cfg |\ sed 's/command_line//' > $TMP sh -n $TMP RET=$? if [ "$RET" -eq 0 ]; then echo "[OK] command scripts - syntax checks ok" else echo "[KO] command scripts - error $RET" fi rm $TMP -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] host object declarations
Hi there, I'd like to further discuss http://tracker.nagios.org/view.php?id=177 which is about host object declarations. You suggest using host_name as something that resolves. However, we don't have a (DNS) hostname for each device. Also, directive description (Nagios documentation) says: "This directive is used to define a short name used to identify the host." The description for the "address" directive actually needs something useful, again, see http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#host You say one should switch to using them differently, i.e. host_name = ip address or DNS hostname alias = description because Nagios4 will do it that way (and Icinga already does). I guess I will then try to update my config that way: - use a DNS name (if one exists) in the "host_name" field, otherwise use the IP as host_name if there is no reverse lookup for it, - in that case I use a symbolic name in the "display_name" field, - and optionally a description in the "alias" field. Furthermore for devices that have no IP but should show as different hosts, define a "virtual" hostname with the IP address of the device's management station (that may be duplicate). Anyway I'd really like to know what everyone thinks about this, and how you do it in a sensible way. -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] notifications
C. Bensend bennyvision.com> writes: > > > > I am using Nagios 3.3.1 > > > > I have got notifications by SMS working now > > > > Is there a way of defining what notifications go to email, what go > > to SMS and what can go to both. I personally would also find it interesting for SMS alerts only to notify a critical state *once* (even if emails are sent repeatedly due to notification_interval greater than 0). The SMS notifications obviously should include a recovery message. Not sure how to do this without setting up two completely separate service checks... -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] critical soft state every 3 hours
> I think it's related to your vmware instance. Try moving it to a > physical machine and see if the problem persists. Easier said than done... > Did you have any task/cron running every 30 minutes? There are some yes. But they should not create heavy load on the system. In any case, one should notice it every 30 minutes then and not only every 3 hours. > What is the io wait of that vm? No problem I guess. http://img341.imageshack.us/img341/9022/moniiowait.jpg -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] critical soft state every 3 hours
Hi, we have a problem where all the services checked around 00:01, 03:01, 06:01, ..., i.e. every three hours one minute after the hour, return a critical soft state. Most of the times they go back to normal, however sometimes they also end up in a hard state. You can imagine the rest... We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with 3GB of RAM and 4 vCPUs. The average load of the machine is about 5. We did not succeed in reproducing network trouble when doing basic checks around those times from and to other hosts. Indeed the VM running nagios experiences packet loss somehow. Even when run on completely different Vmware hosts: Tue Apr 17 21:02:01 CEST 2012 5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms – 5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms 5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms – Wed Apr 18 09:02:01 CEST 2012 5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms – 5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms – 5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms – Wed Apr 18 12:02:01 CEST 2012 5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms – Wed Apr 18 15:01:01 CEST 2012 5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms – Wed Apr 18 15:02:01 CEST 2012 5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms Do you think this is related to Nagios? What could that be? Here are some Nagios metrics: Services Actively Checked: <= 1 minute: 0 (0.0%) <= 5 minutes:2096 (78.3%) <= 15 minutes: 2626 (98.1%) <= 1 hour:2665 (99.5%) Since program start: 2666 (99.6%) MetricMin. Max. Average Check Execution Time: 0.00 sec 52.15 sec1.133 sec Check Latency: 0.00 sec 3.03 sec 0.183 sec Percent State Change: 0.00%64.54% 1.16% Check Stats: TypeLast 1 Min Last 5 MinLast 15 Min Active Scheduled Host Checks 54282 602 Active On-Demand Host Checks 25123 405 Parallel Host Checks 56290 614 Serial Host Checks000 Cached Host Checks23 115 387 Passive Host Checks 000 Active Scheduled Service Checks 987 4203 12647 Active On-Demand Service Checks 000 Cached Service Checks 000 Passive Service Checks 000 External Commands000 Thanks marki -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] one shot check at specific time of day
Andreas Ericsson op5.se> writes: > > > > I have problems defining a check so that it only runs once each day (at a > > specified time). > First thought; Set the check_interval to less than the checking window. > Assuming you're using minutes (like most people), the check_interval > should probably be something like 10, so it fits once but not twice > within the scheduled window. > That seems to help. Sidenote: day 1 - -1 doesn't work at all. If used today, it sets next scheduled time to April 1. Haha. -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] one shot check at specific time of day
Hi people, I have problems defining a check so that it only runs once each day (at a specified time). Inspired by http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg24221.html I tried define timeperiod { timeperiod_name tp_backupcheck alias mornin checks day 1 - -1 08:00-08:15 } as well as define timeperiod { timeperiod_name tp_backupcheck alias mornin checks monday 08:00-08:15 tuesday 08:00-08:15 wednesday 08:00-08:15 thursday08:00-08:15 friday 08:00-08:15 saturday08:00-08:15 sunday 08:00-08:15 } combined with define service { use normal host_name bla01 service_description bla backup check_command check_bla_backup max_check_attempts 1 check_interval 30 check_periodtp_backupcheck ... } Now it schedules the check each day for 07:58 or 07:59 or 07:59:58 or alike and obviously it is never run. How do I do this properly? Thanks Marki -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] monitoring batch jobs
Claudio Kuenzler claudiokuenzler.com> writes: > In the batchfile you could add a status output to an external file, which is then checked by Nagios. Probably the easiest way to do this. Now *that* is not really possible since those are batch jobs on exotic platforms without send_ncsa or nrpe or alike. I'm afraid we have to work with the mails I believe. -- Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] monitoring batch jobs
Hey there, is there a recommended way of monitoring batch jobs when the only kind-of log available are mails sent by each part of the job? In that case one should also find out if a certain mail was NOT sent, i.e. has not arrived in a certain timeperiod. While all of this is somehow scriptable, I thought I would ask first if there exists something to build such a solution upon... Thanks marki -- Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null