Re: [Nagios-users] host object declarations

2012-10-02 Thread Marki

> The only thing I'm saying in that bugreport is
> that Nagios does not and will not complain when the "address" fields of
> hosts are unique.

In fact you also said:

> that suggests that either your namestandard (used for the host_name field)
> sucks, or that you or your co-workers are simply confused when it comes
> to configuring Nagios.

So let's talk best practices.

In practice it did occur that when someone copy/pasted a bunch of service
definitions (yeah I know you should use templates, call that confusion if
you like) just forgot to update the address field and the newly created checks
(or so they believed) did nothing more than check the old stuff once more.

According to what you suggest, it is wisest to indeed use DNS names where you
can in the host_name field. (and only use the address field in special
circumstances) (in Nagios4)

If you define your nomenclature like that, possible confusion may be
eliminated ;-)

BTW I have all sorts things in place that
- check for duplicate addresses
- check for syntax errors all over the place
- check if the plugin specified in command_line actually exists thus preventing
stupid error 127 or alike
If anyone finds that interesting, see below.

bye,
Marki

PS. As a sidenote: I once created a more or less functional version of a script
allowing to change service names. It's not trivial but can be done.
Does anyone know of something existing and working that allows to change
service and also hostnames?





for f in $(grep command_line $NAGIOS/etc/checkcommands.cfg | awk '{print $2}' |\
grep USER1 | sed 's/\$USER1\$\///'); do
if [ ! -f $NAGIOS/libexec/$f ]; then
echo "[KO] $f does not exist"
ERR=$(($ERR + 1))
fi
done
[ "$ERR" -eq 0 ] && echo "[OK] all commands seem to exist"

# make sure to always update both statements!
if [ "$(grep -r address $NAGIOS/etc/hosts* | grep -v '~' | grep -v svn |\
grep -v '.bak' | cut -d: -f2 | tr -d ' \t' | awk -F'address' '{print $2}' |\
sort | uniq -c | sort -n | grep -v ' 1 ' | wc -l)" -ne 0 ]; then
echo "[INFO] IP address defined more than once:"
grep -r address $NAGIOS/etc/hosts* | grep -v '~' | grep -v svn |\
grep -v '.bak' | cut -d: -f2 | tr -d ' \t' |\
awk -F'address' '{print $2}' | sort | uniq -c | sort -n | grep -v ' 1 '
else
echo "[OK] Dup IP check ok"
fi

# syntax check - bash scripts
OK=1
echo Please wait...
for i in $(find $NAGIOS/libexec -type f | grep -v '~'); do
echo $i | grep -q '/.svn/' && continue
if [[ "$i" =~ '\.sh' ]] || [ "$(head -1 $i)" = '#!/bin/bash' ]; then
if ! sh -n $i; then
OK=0
echo "[KO] bash syntax error - $i"
fi
fi
if [[ "$i" =~ '\.pl' ]] || [ "$(head -1 $i)" = '#!/usr/bin/perl' ]; then
if ! perl -wc $i; then
OK=0
echo "[KO] perl syntax error - $i"
fi
fi
if [[ "$i" =~ '\.php' ]]; then
if ! php -l $i &>/dev/null; then
OK=0
echo "[KO] PHP syntax error - $i"
fi
fi
done
[ "$OK" -eq "1" ] && echo "[OK] shell scripts - syntax checks ok"

# check command syntax
TMP=/tmp/$(basename $0).$$
grep command_line $NAGIOS/etc/misccommands.cfg $NAGIOS/etc/checkcommands.cfg |\
sed 's/command_line//' > $TMP
sh -n $TMP
RET=$?
if [ "$RET" -eq 0 ]; then
echo "[OK] command scripts - syntax checks ok"
else
echo "[KO] command scripts - error $RET"
fi
rm $TMP






--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] host object declarations

2012-10-02 Thread Marki
Hi there,

I'd like to further discuss
http://tracker.nagios.org/view.php?id=177
which is about host object declarations.

You suggest using host_name as something that resolves. However, we don't have a
(DNS) hostname for each device.
Also, directive description (Nagios documentation) says: "This directive is used
to define a short name used to identify the host."
The description for the "address" directive actually needs something useful,
again, see http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#host

You say one should switch to using them differently, i.e. 
host_name = ip address or DNS hostname
alias = description
because Nagios4 will do it that way (and Icinga already does).

I guess I will then try to update my config that way:
- use a DNS name (if one exists) in the "host_name" field, otherwise use the IP
as host_name if there is no reverse lookup for it,
- in that case I use a symbolic name in the "display_name" field,
- and optionally a description in the "alias" field.
Furthermore for devices that have no IP but should show as different hosts,
define a "virtual" hostname with the IP address of the device's management
station (that may be duplicate).

Anyway I'd really like to know what everyone thinks about this, and how you do
it in a sensible way.


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] notifications

2012-05-03 Thread Marki

C. Bensend  bennyvision.com> writes:

> 
> 
> > I am using Nagios 3.3.1
> >
> > I have got notifications by SMS working now
> >
> > Is there a way of defining what notifications go to email, what go
> > to SMS and what can go to both.

I personally would also find it interesting for SMS alerts only to notify
a critical state *once* (even if emails are sent repeatedly due to
notification_interval greater than 0). The SMS notifications obviously 
should include a recovery message.
Not sure how to do this without setting up two completely separate
service checks...


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] critical soft state every 3 hours

2012-04-23 Thread Marki

> I think it's related to your vmware instance. Try moving it to a
> physical machine and see if the problem persists.

Easier said than done...

> Did you have any task/cron running every 30 minutes?

There are some yes. But they should not create heavy load on the 
system. In any case, one should notice it every 30 minutes then 
and not only every 3 hours.

> What is the io wait of that vm?

No problem I guess.

http://img341.imageshack.us/img341/9022/moniiowait.jpg




--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] critical soft state every 3 hours

2012-04-20 Thread Marki
Hi,

we have a problem where all the services checked around 00:01, 03:01, 06:01,
..., i.e. every three hours one minute after the hour, return a critical soft
state. Most of the times they go back to normal, however sometimes they also end
up in a hard state. You can imagine the rest...

We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with
3GB of RAM and 4 vCPUs. The average load of the machine is about 5.

We did not succeed in reproducing network trouble when doing basic checks around
those times from and to other hosts. Indeed the VM running nagios experiences
packet loss somehow. Even when run on completely different Vmware hosts:

Tue Apr 17 21:02:01 CEST 2012
5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms
–
5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms
5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms
–
Wed Apr 18 09:02:01 CEST 2012
5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms
–
5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms
–
5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms
–
Wed Apr 18 12:02:01 CEST 2012
5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms
–
Wed Apr 18 15:01:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms
–
Wed Apr 18 15:02:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms

Do you think this is related to Nagios? What could that be?

Here are some Nagios metrics:

Services Actively Checked:
<= 1 minute:   0 (0.0%)
<= 5 minutes:2096 (78.3%)
<= 15 minutes:   2626 (98.1%)
<= 1 hour:2665 (99.5%)
Since program start:  2666 (99.6%)

MetricMin.  Max.  Average
Check Execution Time:   0.00 sec 52.15 sec1.133 sec
Check Latency:  0.00 sec 3.03 sec 0.183 sec
Percent State Change:   0.00%64.54% 1.16%

Check Stats:
TypeLast 1 Min  Last 5 MinLast 15 Min
Active Scheduled Host Checks 54282 602
Active On-Demand Host Checks 25123 405
Parallel Host Checks 56290  614
Serial Host Checks000
Cached Host Checks23  115  387
Passive Host Checks   000
Active Scheduled Service Checks  987 4203   12647
Active On-Demand Service Checks   000
Cached Service Checks 000
Passive Service Checks   000
External Commands000



Thanks

marki


--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] one shot check at specific time of day

2012-03-21 Thread Marki
Andreas Ericsson  op5.se> writes:

> > 
> > I have problems defining a check so that it only runs once each day (at a
> > specified time).
> First thought; Set the check_interval to less than the checking window.
> Assuming you're using minutes (like most people), the check_interval
> should probably be something like 10, so it fits once but not twice
> within the scheduled window.
> 

That seems to help.

Sidenote: 
 day 1 - -1 
doesn't work at all. If used today, it sets next scheduled time to April 1. 
Haha.


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] one shot check at specific time of day

2012-03-21 Thread Marki
Hi people,

I have problems defining a check so that it only runs once each day (at a
specified time).

Inspired by
http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg24221.html I 
tried
define timeperiod {
timeperiod_name tp_backupcheck
alias   mornin checks
day 1 - -1  08:00-08:15
}
as well as
define timeperiod {
timeperiod_name tp_backupcheck
alias   mornin checks
monday  08:00-08:15
tuesday 08:00-08:15
wednesday   08:00-08:15
thursday08:00-08:15
friday  08:00-08:15
saturday08:00-08:15
sunday  08:00-08:15
}

combined with

define service {
use normal
host_name   bla01
service_description bla backup
check_command   check_bla_backup
max_check_attempts  1
check_interval   30
check_periodtp_backupcheck
...
}

Now it schedules the check each day for 07:58 or 07:59 or 07:59:58 or alike and
obviously it is never run.

How do I do this properly?

Thanks

Marki


--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] monitoring batch jobs

2012-03-09 Thread Marki
Claudio Kuenzler  claudiokuenzler.com> writes:
> In the batchfile you could add a status output to an external file, which is
then checked by Nagios. Probably the easiest way to do this. 

Now *that* is not really possible since those are batch jobs on exotic platforms
without send_ncsa or nrpe or alike. I'm afraid we have to work with the mails I
believe.


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] monitoring batch jobs

2012-03-09 Thread Marki
Hey there,

is there a recommended way of monitoring batch jobs when the only kind-of log
available are mails sent by each part of the job?

In that case one should also find out if a certain mail was NOT sent, i.e. has
not arrived in a certain timeperiod.

While all of this is somehow scriptable, I thought I would ask first if there
exists something to build such a solution upon...

Thanks

marki


--
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null