Re: [Nagios-users] Nagiosgraph and load graphing

2010-04-08 Thread Tobias Klausmann
Hi! 

On Wed, 07 Apr 2010, Paras pradhan wrote:
> How do I change it to match the output similar to what w and
> top provides. i.e instead of 350m, 440m,200m  graph it as 0.03,
> 0.04, 0.02.

This is not a matter of NagiosGraph itself, but of its rrdtool
backend. To create the legend on the graph, it will use a line
like this:

rrdtool graph [...] 'GPRINT:B:LAST:Cur %6.1lf %sOct' [...]

That GPRINT part creates the legend for a certain data source (in
this case 'B'). The thing responsible for the SI prefix (i.e. m,
k, G and so on) is the "%s". So to get rid of that you'd have to
make NG not include it when creating graphs.

Yes, that may be a nasty piece of effort, but I don't know NG
very well, so I can't say if it's possible easily.

For more information on RRDTool itself, I recommend its man pages
or better yet, its homepage http://oss.oetiker.ch/rrdtool/

HTH,
Tobias
-- 
printk(CARDNAME": Bad Craziness - sent packet while busy.\n" );
linux-2.6.6/drivers/net/smc9194.c

--
Download IntelĀ® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] PNP swap template HOWTO

2009-05-06 Thread Tobias Klausmann
Hi! 

On Wed, 06 May 2009, Jim Avery wrote:
> 2009/5/6 Tobias Klausmann :
> > I did something else: I patched PNP so that it removes the
> > check_nrpe! prefix if it's there, then does processing as usual.
> > I've sent this (trival, four-line) patch to the PNP maintainer
> > but never got an answer :/
> 
> Where would you stop though?  Would you also remove check_nt and
> check_snmp prefixes?  I appreciate you may find the patch useful and
> forgive me if I've misunderstood, but I'm not convinced it's
> necessary.

It is most useful to me, no doubt. Still I would have expected
that the original author noticed this issue. I suspect 90% of
Nagios users have *way* more usefully graphable NRPE/NSCP/SNMP
checks than "plain" ones.

Regards,
Tobias

-- 
The only problem with troubleshooting is that sometimes,
trouble shoots back.

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] PNP swap template HOWTO

2009-05-06 Thread Tobias Klausmann
Hi! 

On Wed, 06 May 2009, Jim Avery wrote:
> It's not quite as simple as that though, because if you set up
> a check_nrpe.php template which makes your swap graphs look
> lovely, it might make all the other checks you run using
> check_nrpe look awful! You might need to consider setting up a
> separate command definition in Nagios just for your swap
> checks.  Make it the same as check_nrpe but call it something a
> little different, for example "check_nrpe-swap". Of course
> you'll need also to change your swap service definition to use
> this command.  You can then rename your check_nrpe.pnp template
> to check_nrpe-swap.pnp and it won't mess up all your other nrpe
> checks.

I did something else: I patched PNP so that it removes the
check_nrpe! prefix if it's there, then does processing as usual.
I've sent this (trival, four-line) patch to the PNP maintainer
but never got an answer :/

Regards,
Tobias

-- 
The only problem with troubleshooting is that sometimes,
trouble shoots back.

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Service latency suddenly through the roof

2009-03-23 Thread Tobias Klausmann
Hi! 

On Mon, 23 Mar 2009, Deborah Martin wrote:
> Should I expect latency to be a lot lower with the new version
> of Nagios ? I'm currently looking at the logs produced so far
> for the new version to see what the latency levels are like. 

Definitely. Nagios 3 does not schedule hosts checks differently
than service checks (i.e. it does not halt everything else when a
host check is run as 2.x does). I've had great success migrating
to 3.x when I had severe latency problems (several minutes of
latency were the norm).

Regards,
Tobias
-- 
panic("mother...");
linux-2.2.16/drivers/block/cpqarray.c

--
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] cool Nagios + DNX tutorial

2008-11-09 Thread Tobias Klausmann
Hi! 

On Sun, 09 Nov 2008, Andreas Ericsson wrote:
> Tobias Klausmann wrote:
> > Apart from that I *really* like it, since it makes a distributed
> > setup feasible (NSCA et al fall very short on that front).
> 
> Try pnsca. It removes the fork()-bomb related performance problems with
> nsca.

The problem I/we have with NSCA is not that of performance. We
provide Nagios services to a large-ish user base (>100 users).
As a result, a single user interface for those users is a must.
With NSCA, said UI is degraded: you can't reschedule a check or
disable checking entirely from the central machine, since there
is no backchannel to the checking nodes. Also, the UI is very
misleading with purely passive checks on the central machines: it
looks like all checks are disabled *entirely* in the web
interface (this could be fixed by hacking the CGIs).

DNX' approach with "dishing out" checks to waiting nodes solves
these problems. Unfortunately, it quickly rendered the central
machine absolutely useless with huge load and memory consumption
last time I checked.

Regards,
Tobias

-- 
Aibohphobia: n. Fear of Palindromes

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] cool Nagios + DNX tutorial

2008-11-08 Thread Tobias Klausmann
Hi! 

On Fri, 07 Nov 2008, Roger wrote:
> My buddy Pat at Petta Tech just put up a great DNX + Nagios tutorial
> 
> http://nagioswiki.com/wiki/index.php/Nagios_%2B_DNX
> 
> He did a good job of documenting how he worked out the various kinks. He's
> running about 5000 checks on 800 hosts, and the servers that have this
> running are HP DL360 G5s, dual quad cores, and 16Gb of RAM

He touches upon one thing that's a real showstopper for me with
DNX: it's pretty much unusable on 64-bit systems/installations.

Apart from that I *really* like it, since it makes a distributed
setup feasible (NSCA et al fall very short on that front).

Are the DNX devs reading along here? If yes: are there plans to
make it work on amd64? I'm willing to do some testing if that's
needed.

Regards,
Tobias



-- 
Yesterday upon the stair,
I saw a man who wasn't there.
He wasn't there again today,
I think he's from the CIA.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] How do I receive alerts for only some of services I am a contact for?

2008-07-04 Thread Tobias Klausmann
Hi! 

On Fri, 04 Jul 2008, Matthew Jurgens wrote:
> I'm wondering if anyone can think of a better way to configure the 
> following scenario:
>
> Assume I have 2 services being monitored, service1 and service2.
> User1 wants to be able to see both services through the CGI interface and 
> hence is defined as a contact for both of them.
> User1 wants notifications for service1 but NEVER for service2.
>
> The only way I can think of setting this up is as follows:
>
>   1. Turn off all notifications for User1. This is the
>  username/password the person logs into the CGI with.
>   2. Create another user called User1_notifications. No one ever logs
>  in as this user but it has the same contact details eg email etc
>  as User1. User1_notifications has notifications enabled.
>   3. Setup User1_notifications as a contact for service1 only.
>
> Hence when service1 alerts User1 will never get alerted (notifications are 
> off), User1_notifications will get alerted (and its really the same person 
> as User1).
> When service2 alerts, User1 will never get alerted (notifications are off) 
> and User1_notifications will also not get alerted (as they are not even a 
> contact).
>
> Its a little bit ugly having to declare the contact twice, but at the 
> moment its the only way I can think of achieving this.
>
> Any ideas appreciated.

We have solved this in the following way:

- Every user gets a normal account for the web interface (usually
  named firstname.lastname). This has a non-delivering
  notification script and notification_options is set to "n".

- For every delivery method (mail, sms, phone calli, ...)
  desired, a user gets an appropriate contact
  (firstname.lastname-mail etc) which is added to the relevant
  services, naturally with the corresponding notification scripts
  and options. 

This works surprisingly well. Factor in inheritance (i.e. re-use
the normal contact everyone has in the notification contacts) and
the configuration even stays manageable with a larger number of
users. We currently have 253 users and 189 contact groups and
this system works well for us.

Regards,
Tobias

-- 
HARDFAIL("Not enough magic.");
linux-2.4.0-test2/drivers/block/nbd.c

-
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Multiple interfaces, multiple parents

2008-05-26 Thread Tobias Klausmann
Hi! 

On Tue, 27 May 2008, Hugo van der Kooij wrote:
>> Unfortunately, Nagios connects multiple parent hosts with
>> logical AND, which means that the host only turns UNREACHABLE
>> when *both* switches are gone.
> 
> The funny thing with redundant paths is that they are in fact
> redundant. So if you loose one you still have connectivity and
> nothing becomes unreachable.
> 
> So where is the flaw in this design in Nagios? If breaking a
> single link ~ will result in something becoming unreachable
> then you do not have true redundancy.

Oh, the machines themselves aren't connected redundantly, only
the switches themselves are. Usually the second link is used for
connections to database machines and the like. As such, if either
links goes down, the machine goes down (from a can-do-its-job
perspective). Yes, I can still reach the machine (on convoluted
paths since the backend nets are not routed).

What I'm after is making a failing switch visible to the admins
of such machines (and ideally everybody else), so I'm aiming for
the distinction of DOWN vs. UNREACHABLE.

I don't really want to make all the switches visible to all
admins via the web interface (and I doubt they all are able to
tell which switch is "theirs" just by the names).

I don't see this behaviour as a *flaw* in Nagios, it's just
unfortunate for me that it works this way and I had hoped there
was a way around it.

Regards,
Tobias

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Multiple interfaces, multiple parents

2008-05-26 Thread Tobias Klausmann
Hey everybody,

I've hit a snag when configuring parents for hosts. First a
little bit about our setup.

Most of our hosts have only one connected ethernet interface (if
you don't count the management cards). Still, we have quite a
handful of hosts (over 100) that have two interfaces.

Up until now, we configured the "management address" as the
address to use by Nagios. That is, Nagios uses the adress the
admins use when SSH'ing to the box. If there are additional IPs
that need monitoring, they're configured as services on the host
they're bound to.

We're now starting to migrate/integre monitoring of our switch
setup into Nagios. We have a satellite switch per rack which in
turn is connected redundantly to a pair of central switches.

Now what we'd like is to have these rack-switches as parents to
the hosts which are connected to them. Here, we run into a
problem: those hosts that have two interfaces would have two
parents, *both* of which should yield an "UNREACHABLE" message if
*either* of them goes belly-up. 

Unfortunately, Nagios connects multiple parent hosts with logical
AND, which means that the host only turns UNREACHABLE when *both*
switches are gone. 

Is there any way out of this? Dependencies seem to work the same
way, near as I can tell. How do you handle such a setup?

Regards & TIA,
Tobias

-- 
printk(KERN_DEBUG "%s: Flex. T...\n", DRV_NAME);
linux-2.6.6/drivers/net/wan/dscc4.c

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Strange notification problem

2008-05-05 Thread Tobias Klausmann
Hi! 

On Mon, 05 May 2008, Ilya Meylikhov wrote:
> I've solved the problem - the CRITICAL state output of some
> services on this host had more than 160 symbols - gnokii was
> unable to send an SMS that is more than 160 symbols. Btw maybe
> anyone knows how to make gnokii send sms which contains more
> than 160 symbols?

The 160-character limit is inherent to SMS (over GSM), as
described here:

http://en.wikipedia.org/wiki/Short_message_service#Message_size

So there is very little you can do about it. You could split the
messages in two, but I advise against it.

Regards,
Tobias

-- 
"You don't *run* programs on Ultrix." - Mark Moraes
"Right, you chase them." - Rayan Zachariassen

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Heads up: Comments might bite you; was Re: Possible bug in 3.0rc3

2008-02-29 Thread Tobias Klausmann
Hi! 

Actually, this was both my and Nagios' fault. You see, faced with
a config block like this:

# foo=bar\
foo=baz

Nagios will see... nothing. The trailing \ in the first line
joins up the second and then both disappear since it's now one
long comment.

Logic-wise I'd say it's 50/50 bug/feature. I personally think
that a # should have precedence over \-at-eol. Usability-wise
it's definitely a bug and I've said as much on -devel.

Still, a nasty trap to fall into. Beware when you next edit your
configs.

Regards,
Tobias

PS: It seems this behaviour was introduced somewhere between b7
and rc3, probably when adding line continuation support.

-- 
"Have you any idea how successful censorship is on TV?
 Don't know the answer?  Hm.  Successful, isn't it?" -Max Headroom

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Possible bug in 3.0rc3

2008-02-28 Thread Tobias Klausmann
Hi! 

I think I may have found a bug in the latest rc. I'm not sure if
it's my own fault, so I'll aks here, first.

Apparently, this line:

service_perfdata_file_template=$HOSTNAME$\t$SERVICEDESC$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\t$TIMET$

in my nagios.cfg is ignored. I can change it to whatevere I
want, Nagios always uses the template as specified in
xdata/xpddefault.h 

I've tried fixing it there and recompiling and voila, it works.
While this proves that I'm not overwriting my own setting
somewhere in my config, it does not constitute a proper fix, of
course.

Any ideas/comments/me-toos?

Regards,
Tobias

-- 
printk(KERN_ERR "i82092aa: Oops, you did something we didn't think of.\n");
linux-2.6.19/drivers/pcmcia/i82092.c

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Docs at leats partially wrong

2008-02-07 Thread Tobias Klausmann
Hi! 

We've been using the $ADDRESSN$ macro in our notification
commands while we were using 2.x. Since we upgraded, some
notifications fail - since said macro does not seem to be
expanded anymore.

The docs are inconsistent in this regard. On one hand,

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#contact

says: 

addressx: Address directives are used to define additional
"addresses" for the contact. These addresses can be anything -
cell phone numbers, instant messaging addresses, etc. Depending
on how you configure your notification commands, they can be used
to send out an alert o the contact. Up to six addresses can be
defined using these directives (address1 through address6). The
$CONTACTADDRESSx$ macro will contain this value.

But then,

http://nagios.sourceforge.net/docs/3_0/macrolist.html

does not list said macro. If anything, one of the pages is wrong.
I see no obvious reason to not exapnd those macros/attributes
anymore, so I suspect a bug.

Any ideas?

Regards,
Tobias

PS: Nagios 3.0b7.
-- 
panic("mother...");
linux-2.2.16/drivers/block/cpqarray.c

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagiosgrapher limits?

2007-11-21 Thread Tobias Klausmann
Hi! 

On Wed, 21 Nov 2007, Palle L Jensen wrote:
> Are there any limitations to Nagiosgrapher like how many
> hosts/services it is capable of to produce graphs for?
> 
> As soon as I get to a certain limit of hosts/services (passed
> around 34 hosts and/or around 100 services), the graphs acts
> "weird", the Icon graphs becomes white and if you click them to
> see the actual graph, it shows no data there. Has anyone run
> into this scenario before? We are monitoring 36 Hosts and 103
> services, where 100 services are being graphed.

It's probably not a sheer amount-of-objects problem. We currently
have close to 480 hosts and well over 5000 services and
NagiosGrapher works. We have about 4500 rrds that are being
tended to, so I doubt the problem you're seeing is caused by the
amount of objects you create graphs for.

Your problem is most probably caused in the RRDG CGIs, not during
the data collection, so I'd start debugging there.

Regards,
Tobias
-- 
In the future, everyone will be anonymous for 15 minutes.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] NRPE Service Dependencies

2007-08-25 Thread Tobias Klausmann
Hi! 

On Sat, 25 Aug 2007, Anand Capur wrote:
> Is there a way to configure nagios, so if NRPE is down or not responding we
> only get 1 notification per box, and not a notification for every service on
> the box?

We've set things up so that NRPE itself (we used a dummy check
inside NRPE, but you could use check_tcp on its port, too) is
monitored. The we set up servicedependencies for all NRPE-checked
service of one host to the NRPE of the same host. Works just
dandy.

Regards,
Tobias
-- 
In the future, everyone will be anonymous for 15 minutes.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] snmp over internet "best practice"

2007-08-25 Thread Tobias Klausmann
Hi! 

On Fri, 24 Aug 2007, Russell Adams wrote:
> The argument was always SNMP (inferring v1), versus NRPE. I've been an
> advocate of using SNMP because there was little client software to
> maintain.

I prefer NRPE over SNMP (no matter what version) for a two simple
reasons:

1) Code complexity. An SNMPd is a hell of a lot more complex than
the NRPE daemon. As we always forbid param passing to NRPE, the
plugins aren't really exposed to the client.

2) Vectors. An SNMPd has code in place to change stuff on the
machine it runs on. No matter how tight your security setup is,
the code is there and a slipup in security might leave you
vulnerable. NRPE just execs stuff which has been preconfigured.
Barring a nasty buffer overflow, you have no "write" access to
the machine - and then, a buffer overflow might happen to an
SNMPd, too.

That said, the only disadvantage of NRPE (security-wise) I can
see is that probably more people look at and dissect snmp daemons
than NRPE. But NRPE is smaller, so that may compensate.

Just my EUR0.02,

Tobias

PS: As for the "should SNMP travel across insecure nets, I'll
also point to those more knowledgable in SNMP. I'm lucky: I don't
have to check remote machines.
-- 
In the future, everyone will be anonymous for 15 minutes.

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Distributed setups

2007-07-02 Thread Tobias Klausmann

Apologies for the other mail with the wrong setup. Too little
coffee on my part.

Hi! 

We're currently looking at creating a distributed setup using
NSCA. One thing that I've found no mention of is how the host and
service commands are forwarded.

Even if the central machien does all the notifications (as we're
planning), completely dis/enabling service/host checks would have
to be distributed from the central machine to the checking
machines. 

Or is the usual setup to let the useres access the web interface
of each "checker machine"? Then how do people know which checks
are run from which machine?

If the central machien only does the "webservice job", i.e.
notifications are handled on the checking machines, how are
sceduled dowtimes, acknowledgements etc handled?

I see how one could write a script or somesuch that distributes
this stuff, but I'd rather not reinvent the wheel.

Regards,
Tobias

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_amanda with nsca

2007-07-02 Thread Tobias Klausmann
Hi! 

We're currently looking at creating a distributed setup using
NSCA. One thing that I've found no mention of is how the host and
service commands are forwarded.

Even if the central machien does all the notifications (as we're
planning), completely dis/enabling service/host checks would have
to be distributed from the central machine to the checking
machines. 

Or is the usual setup to let the useres access the web interface
of each "checker machine"? Then how do people know which checks
are run from which machine?

If the central machien only does the "webservice job", i.e.
notifications are handled on the checking machines, how are
sceduled dowtimes, acknowledgements etc handled?

I see how one could write a script or somesuch that distributes
this stuff, but I'd rather not reinvent the wheel.

Regards,
Tobias

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Notificatiosn - best common practice

2007-05-31 Thread Tobias Klausmann
Hi! 

The mails by shacky and Janet Post got me to thinking about a
thread regarding "best common practice" when it comes to
user accounts.

We currently have 370 hosts and 3740 services. They can be
sorted into some 60 groups of related servers. 

In total, we have about 70 real people which manage different
sets of servers.

As for the structuring of notification and web site access, we do
the following:

- Every person gets three different contact objects: one for web
  site access named firstname.lastname, one for email notification
  (same, with -email postfix) and one for sms/pager notification
  (-sms postfix). All of them have different notify-by scripts
  (or, in the case of the first, notification_optiosn set to "n").

- Every set of people that is responsible for a set of machines
  and services is organized into a group, which is expanded into
  three contactgroups by the above scheme. So, for example, there
  is DNSAdmins, DNSAdmins-email and DNSAdmins-sms. This has the
  advantage of notifying some of the DNSAdmins by SMS, some by
  email etc. Also, we can let people use the web interface
  without having to notify them, yet they don't see *all*
  hostgroups and services. 

For the different hosts and services, said hostgroups are used.
If a contact is completely unused (for example, one of the admins
never gets an SMS, even though he might be able to view several
sets of machines), it is commented out but left in the config.

This kind of setup, while immensely flexible, leads to quite some
bloat in the config. We currently have 183 contacts and 109
contact_groups - not counting the commented-out stuff.
  
I don't know of any better way of accomplishing what we need, but
I'm sure others have found different and/or better ways to do it.

So, how do you manage your user accounts?

Regards, 
Tobias

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] front end tools for Nagios

2007-05-17 Thread Tobias Klausmann
Hi! 

On Thu, 17 May 2007, [EMAIL PROTECTED] wrote:
> Hari Sekhon wrote: 
>> Hugo van der Kooij wrote:
>> > On Wed, 16 May 2007, RR wrote:
>> >> I'm relatively new to Nagios and am looking for cool front end tools 
>> >> to managing the config files.
>> >>
>> >> Of these, which ones do other users recommend?
>> >> 
>> > vi. ;-)
>> >
>> > But I never understood the need for 'cool'. I want my tools to be fit 
>> > for the job and I do not care what they look like that much.
>> >   
>> adding my 2 cents...
>> 
>> I agree with Hugo, I also use vi(m).
>> 
>> front end configuration tools?
>> more fluff and less understanding, why bother?
>> 
> I feel it is much better to understand the layout and structure to the
> config files than to hide it all behind a gui type front end.
> 
> You learn a lot more about how nagios hangs together by making config
> files, running pre-flight checks and getting errors. You then correct
> the errors, normally a simple typo or an omitted name in another config
> file. This all builds a great understanding of Nagios.
> 
> And well who needs cool, vi is brilliant and in its on way is "cool" as
> you get it on most flavours of unix/linux. 
> 
> Well that's my six shillings worth

Same here, vim. But, and I can not stress this enough, keep your
config in some kinde of version control system. I personally use
Subversion, but there are lots of others out ther (CVS, RCS,
Bitkeeper, git, ...). This is especially true if more than one
person edits the config.

Regards,
Tobias

-- 
In the future, everyone will be anonymous for 15 minutes.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Monitor DB Server without outside IP Address

2007-03-01 Thread Tobias Klausmann
Hi! 

On Thu, 01 Mar 2007, Marc Powell wrote:
> > From: [EMAIL PROTECTED] [mailto:nagios-users-
> > [EMAIL PROTECTED] On Behalf Of Patrick Morris
> > Sent: Thursday, March 01, 2007 4:07 PM
> > To: James Pells
> > Cc: nagios-users@lists.sourceforge.net
> > Subject: Re: [Nagios-users] Monitor DB Server without outside IP Address
> > 
> > There are reasons people don't make their DB boxes available from the
> > internet.  A couple better solutions might be:
> > 
> > 1. Monitor the box from inside the network it's on, or
> > 2. Have the DB submit passive checks to your monitoring box.
> 
> Depending on what you are actually interested in monitoring some other
> options might be...
> 
> 3. create a small script on your web server that performs checks on your
> DB and outputs a web page than you can check with check_http.
> 4. check_nrpe -> webserver calls check_nrpe -> DB server (untested but
> should work)

Note though that longer NRPE cascades (>=3 hops) can be
notopriously difficult to debug. Worse, if you don't have proper
dependencies (between your NRPE checks and a check for NRPE
itself), you can very easily spam your cell into oblivion ;)

Also, some DB machines deliberately have no default route (or one
not letting them leave their nets). So passive sending is not
possibel, either.

So, from a security standpoint, I'd recommend the "webpage does
the checks and is checked by check_http" approach.

Regards,
Tobias


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] cpu usage / memory usage

2007-02-13 Thread Tobias Klausmann
Hi! 

On Tue, 13 Feb 2007, Niels Hamaker wrote:
> the check_mem plugin is a simple and effective plugin to check memory. You 
> can find it on nagiosexchange.org.
> We generally don't check CPU usage, just load, but it depends on your 
> setup, the applications your running, which of the two is the most 
> informative. It would be useful to look into that.

The check_me.pl script that comes with Nagios 2.x (contrib/) is
broken, it uses the wrong columns of vmstat output to determine
free and used memory. Actually, what exactly constitutes free (or
usable?) memory probably is debatable.

Regards,
Tobias


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios memory Leaks

2007-01-24 Thread Tobias Klausmann
Hi! 

On Wed, 24 Jan 2007, John Longland wrote:
> I have been reading with interest about these memory leaks.
> I see you mention 2.5 & 2.6.
> Does this happen with 2.4 as well ??

I can't really tell: back when I used 2.4 I wasn't aware of this
problem and hadn't added so many machines/services yet. I figure
the problem *probably* exists. On the other hand: if it hasn't
bitten you yet, why worry? Just keep an eye on your nagiostats
output. Any Nagios admin should IMO do that anyway, regardless of
known problems.

Regards,
Tobias

PS: Think about adjusting your quoting style.

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Memory leaks

2007-01-24 Thread Tobias Klausmann
Hi! 

On Wed, 24 Jan 2007, Andreas Ericsson wrote:
> > Activating the embedded Perl interpreter and -cache will increase
> > the amount of lost memory to about 5-6M per hour. In this case,
> > however, sometimes the memory usage snaps back, i.e. some of the
> > lost memory is collected. I've not yet found out what triggers
> > the reclaim. Still, over the course of hours, more and more
> > memory is lost. Still, it's roughly linear memory loss.
> 
> Yes. Embedded perl is known to be leaky. It's also mentioned in various
> documents around the web.

Well, I think I can live without the embedded interpreter, the
machine is beefy enough.

> > Unfortunately, performance degradation is not just on the memory
> > used front. With increased memory usage, check latency increases.
> > In the worst case, this can mean that latency increases by 120s in
> > about six hours. This has the net effect that for our case, we
> > have to restart Nagios every two hours. 
> 
> The latency increase should only happen when the machine starts swapping.
> For large networks with the access-patch thingie that could happen fairly
> quickly though, I imagine.

No, it's definitely not swapping (as the graphs show). My
conclusions about the reasons for the degradation were drawn with
exactly that in mind.


> > For the case of 2.5 and 2.6 without the permissions patch, it's
> > a lot less bad, but still bad enough to require restarting Nagios
> > at least every eight hours. 
> > 
> > Without all the fancy stuff, we get to restarting Nagios every 24
> > hours, as described above.
> 
> That seems a bit obsessive. Are you doing anything unusual with the system?
> We have several (well over a hundred) installations where Nagios has been up
> and running for several months without requiring a restart.

Well, the system is a standalone Nagios server which is only
that, no other services. I'll take a very close look at all the
cronjobs etc. that might cause additional friction, but I doubt
they're causing any trouble.

> > For vanilla Nagios, at least it's clear that in whatever way
> > memory is wasted, it also slows Nagios down - a possibility would
> > be a linked list that is walked and gets appended over and over.
> > But I guess those with knowledge of the inner workings of Nagios
> > have more clue about this than I do.
> 
> Anyone wanting to look into it should probably take a look at the
> event scheduling queue.

Thanks, I'll ask our resident C guru to take a close look at it.

Regards,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Memory leaks

2007-01-23 Thread Tobias Klausmann
Hi! 

(First off: if this should also go to nagios-devel, just yell at
 me.)

Nagios 2.6 and 2.5 have memory leaks. They are not that big that
within hours your machine will be swapping, but they degrade
performance in other ways.

First off, their approximate extent.

2.5 and 2.6 without perl cache have the smallest memory leaks. A
fairly busy Nagios server (hardware quoted below) with about 3000
services on about 330 hosts will degrade from 330M used (that's
*not* Nagios alone) to 368M used in about 16 hours. Or about 2.4
MB per hour. The very same machine behaves neutral if Nagios is
not running, so it's definitely Nagios itself.

Activating the embedded Perl interpreter and -cache will increase
the amount of lost memory to about 5-6M per hour. In this case,
however, sometimes the memory usage snaps back, i.e. some of the
lost memory is collected. I've not yet found out what triggers
the reclaim. Still, over the course of hours, more and more
memory is lost. Still, it's roughly linear memory loss.

And finally, there's the advanced permission patch. With that
patch, memory leaking skyrockets to about 15M/hour.

Now all of this could be alleviated by simply restarting Nagios
every night. It's not actually a bugfix but merely doctoring on
the symptoms, but still, it's pragmatic.

Unfortunately, performance degradation is not just on the memory
used front. With increased memory usage, check latency increases.
In the worst case, this can mean that latency increases by 120s in
about six hours. This has the net effect that for our case, we
have to restart Nagios every two hours. 

For the case of 2.5 and 2.6 without the permissions patch, it's
a lot less bad, but still bad enough to require restarting Nagios
at least every eight hours. 

Without all the fancy stuff, we get to restarting Nagios every 24
hours, as described above.

Further observations: the permission patch causes latency
degradation to be directly correlated to amount of notifications,
The more notifications, the quicker things get nasty.

For vanilla Nagios, at least it's clear that in whatever way
memory is wasted, it also slows Nagios down - a possibility would
be a linked list that is walked and gets appended over and over.
But I guess those with knowledge of the inner workings of Nagios
have more clue about this than I do.

The question that remains is, if this can (and will) be tackled
before 3.0 is released. A related question is if Nagios 3 will be
prone to the same problem.

Any thoughts, ideas etc. are appreciated.

Regards,
Tobias

PS: On a whim, I tried running Nagios through/in Valgrind but
honestly got knocked over by the amount of info Valgrind spewed
at me.

PPS: Our setup uses only active service checks, notifications by
mail (some of it to SMS gateways etc). All host checks are active
yet only are executed if needed (the usual way Nagios works). All
host checks are using ping.  All plugins have a hard timeout of
10s.

PPPS: Hardware specs of the machine I tested with:
Dual dualcore Opteron 2.2GHz (Model 2214)
2GBytes of RAM
(if there's anything else relevant, drop me a line)

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Completely stumped

2007-01-22 Thread Tobias Klausmann
Hi! 

On Fri, 19 Jan 2007, Andreas Ericsson wrote:
> Was this by any chance coupled with a big fat spike of memory usage
> on the Nagios server? I assume you do monitor memory usage, right?

I've checked it every now and then and found nothing unusal.
While switching around 2.5/2.6 with and without patches, I've
added a graph for memory usage and noticed that there probably is
some sort of memory leak at least in 2.6 (without patch).

Here's the graph so far:

http://eric.schwarzvogel.de/~klausman/nagios-perf-4/memusage-week.png

That inital drop is a restart I did by hand.

I'm currently testing 2.5 sans patches to see if this leak was
introduced with 2.5 -> 2.6.

> > My C-fu is weak, so I ask those more versed in it to take a look
> > at the patch. I'll also hand it to our local C guru, but he's
> > quite swamped in work, so that may take some time.
> 
> Comments inline.

I'll take a look at them today/tomorrow.

> [...] 
> All in all, I'd advice against using this patch, or at least try without it
> first thing you do in case you run into problems.

Hm. I was afraid you'd say that. You think it's wortwhile trying
to "rescue" the patch?

Regards,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Completely stumped

2007-01-19 Thread Tobias Klausmann
Hi! 

On Fri, 19 Jan 2007, Tobias Klausmann wrote:
> As far as I can tell, backdating from my own packages (2.6 with
> said patch) to dsitro packages (Gentoo, Nagios v2.5) fixed the
> problem. The new machine has run close to 12 hours without even
> remotely acting up. I've now updated to 2.6, still sans patch.
> I'll post in a few hours or sooner if I find out anything new.

So far, a vanilla 2.6 works just as advertised. This is prrof
enough (to me at least) that the advanced permissions patch is
buggy. Somewhere in there, a data structure is not freed
correctly, I suspect. Could those who initially created the patch
take a look? I'd be most grateful for that (since I really like
it's "advertised" functionality).

Regards,
Tobias

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Completely stumped

2007-01-19 Thread Tobias Klausmann
Hi! 

On Thu, 18 Jan 2007, Andreas Ericsson wrote:
> > The *only* thing I've left to try is removing the multiuser patch
> > we talked about at the end of last year. If that does it, at
> > least I have an idea *where* in the code my problem lies. I'll
> > try that route tonight.
> > 
> 
> Which patch was this? I didn't find it in the december archives.

It's the advanced permission patch as created by Altinity and
Alex Burger. The thread has the subject "Advanced permissions".
First mail is by me and has the message id
[EMAIL PROTECTED]

As far as I can tell, backdating from my own packages (2.6 with
said patch) to dsitro packages (Gentoo, Nagios v2.5) fixed the
problem. The new machine has run close to 12 hours without even
remotely acting up. I've now updated to 2.6, still sans patch.
I'll post in a few hours or sooner if I find out anything new.

Also, I have a suspicion bout the reasons.

Originally I didn't suspect the patch to be at fault: I only use
it to regulate who can see what and who can use commands in the
web interface. How that should affect the usual checks was beyond
me. Added to that, we hadn't deplyed any production config with
that feature in use - yet the production machine acted up.

The other day I realized that the patch goes beyond what I was
using: it also modifies notification behaviour. Looking back, I
seem to recall that the degradation of check latency was coupled
to the amount of notifications being sent. Unrelated to the
Nagios troubles, we had some issues yesterday and Wednesday (with
quite a few notifications being sent) and voila, the curves
skyrocketed.

My C-fu is weak, so I ask those more versed in it to take a look
at the patch. I'll also hand it to our local C guru, but he's
quite swamped in work, so that may take some time.

Regards (and thanks!),
Tobias

PS: The patch I use is attached.
-- 
Never touch a burning system.
diff -ur nagios-2.5.org/base/notifications.c nagios-2.5/base/notifications.c
--- nagios-2.5.org/base/notifications.c 2006-04-07 18:24:13.0 -0400
+++ nagios-2.5/base/notifications.c 2006-11-05 22:23:57.0 -0500
@@ -832,7 +832,7 @@
/* find all contacts for this service */

for(temp_contact=contact_list;temp_contact!=NULL;temp_contact=temp_contact->next){

-   if(is_contact_for_service(svc,temp_contact)==TRUE)
+   
if(is_contact_for_service_perm(svc,temp_contact,'n')==TRUE)
add_notification(temp_contact);
}
}
@@ -1572,7 +1572,7 @@
/* get all contacts for this host */

for(temp_contact=contact_list;temp_contact!=NULL;temp_contact=temp_contact->next){
 
-   if(is_contact_for_host(hst,temp_contact)==TRUE)
+   if(is_contact_for_host_perm(hst,temp_contact,'n')==TRUE)
add_notification(temp_contact);
}
}
diff -ur nagios-2.5.org/cgi/cgiauth.c nagios-2.5/cgi/cgiauth.c
--- nagios-2.5.org/cgi/cgiauth.c2006-10-08 19:35:18.0 -0400
+++ nagios-2.5/cgi/cgiauth.c2006-11-05 22:55:28.0 -0500
@@ -218,7 +218,7 @@
temp_contact=find_contact(authinfo->username);
 
/* see if this user is a contact for the host */
-   if(is_contact_for_host(hst,temp_contact)==TRUE)
+   if(is_contact_for_host_perm(hst,temp_contact,'r')==TRUE)
return TRUE;
 
/* see if this user is an escalated contact for the host */
@@ -295,14 +295,14 @@
return FALSE;
 
/* if this user is authorized for this host, they are for all services 
on it as well... */
-   if(is_authorized_for_host(temp_host,authinfo)==TRUE)
-   return TRUE;
+   /* if(is_authorized_for_host(temp_host,authinfo)==TRUE)
+   return TRUE;*/
 
/* find the contact */
temp_contact=find_contact(authinfo->username);
 
/* see if this user is a contact for the service */
-   if(is_contact_for_service(svc,temp_contact)==TRUE)
+   if(is_contact_for_service_perm(svc,temp_contact,'r')==TRUE)
return TRUE;
 
/* see if this user is an escalated contact for the service */
@@ -419,16 +419,16 @@
if(temp_contact && temp_contact->can_submit_commands==FALSE)
return FALSE;
 
-   /* see if this user is a contact for the host */
-   if(is_contact_for_host(temp_host,temp_contact)==TRUE)
+   /* see if this user is a contact for the host with permissions 
*/
+   if(is_contact_for_host_perm(temp_host,temp_contact,'x')==TRUE)
return TRUE;
 
/* see if this user is an escalated contact for the host */
if(is_escalated_contact_for_host(temp_host,temp_contact)==TRUE)
return TRUE;
 
-   /* this us

[Nagios-users] Completely stumped

2007-01-18 Thread Tobias Klausmann
Hi!

The other day, we got our beefier machine. I had hoped my latency
problems (ever increasing check latencies) would go away or at
least turn irrelevant with that. They didn't.

More precisely: we have migrated to a four-core Opteron 2.2GHz
with 2GBs of RAM and a quite fast I/O Subsystem. 

We have 331 / 2940 hosts/services which are all checked actively.

Still, after less than an hour, our check latency skyrockets well
over 120s. Unacceptable.

I've tested a whole slew of stuff in order to find out what the
hell is wrong. I've played with concurrency settings and just
about any performance tip save distributing the setup.

Nothing worked.

Not a single metric on the machine itself (interrupt rate,
context switches or anything else the *stat utilities show me)
tells me it's the machine's fault.

I'm out of ideas (and to be frank, a bit desperate).

What the hell can I do?

The *only* thing I've left to try is removing the multiuser patch
we talked about at the end of last year. If that does it, at
least I have an idea *where* in the code my problem lies. I'll
try that route tonight.

Until then, I'm happy to hear your theories.

Regards,
Tobias
(exhausted)






-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2007-01-09 Thread Tobias Klausmann
Hi! 

On Tue, 02 Jan 2007, Daniel Meyer wrote:
> Program Running Time: 10d 21h 22m 42s
> 
> So, for almost eleven days nagios runs smoothly now, no more
> latency problems. I'll try it again with EPN (but still without
> perlcache) now.

I've finally gotten around to recompile Nagios without EPN and
without the Perlcache. As you can see on these graphs:

http://eric.schwarzvogel.de/~klausman/nagios-perf-3/

(especially
http://eric.schwarzvogel.de/~klausman/nagios-perf-3/latencies.png
)

I didn't quite help (much). While the curve now has a flatter
slope and it even goes down in spots, it still seems to ever
increase on the whole. Even it would stay on the level we saw
last night (~100s check latency) I wouldn't be too happy. With a
300s check interval, 100s latency is just too much (IMHO).

What's left is enabling Perlcache again (yet keeping EPN off).
I'm not terribly hopeful that that will help, but I'm running out
of ideas quickly.

Also note that switching *off* EPN/PC led to *less* CPU usage. 
Strange, isn't it?

Regards,
Tobias
-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-26 Thread Tobias Klausmann
Hi! 

On Mon, 25 Dec 2006, Robert Hajime Lanning wrote:
> > I think the two issues are independent (or at most correlated).
> > If switching off EPN/perlcache fixes the issues for me, too, I'd
> > guess it's either the embedded Perl or the cache. Finding out
> > which is a matter of simple experimentation. I hope :)
> >
> 
> Does any of your checks have arguments that change?

No, I don't think so. If there's no implicit carry-over in a
plugin, we don't do that at all.

> I have a few that use the output of the last check to see
> differences in accumulators and the like.  And I see that
> the caching code caches a parsed version of the arguments.
> This caching has no expirations just appending the new
> argument list.

That might explain memory consumption, though one has to wonder
if linear increase is fast enough to explain it. If the
arguments get *doubled* everytime, though...

> I am trying to comment out the caching of arguments and have
> the arguments parsed each time.

Good luck.

> > Merry christmas to the lot of you, btw.
> >
> > Regards,
> > Tobias
> > (away from work and Nagios 'til January 8th)
> 
> Merry Christmas, and I am too much a geek to leave this be,
> until January. :)  (Have to tinker...)

Oh, I do have my own private projects I can tinker with :)

Regards,
Tobias
-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-25 Thread Tobias Klausmann
Hi! 

On Mon, 25 Dec 2006, Robert Hajime Lanning wrote:

> 
> 
> > Just rechecked. After 72 hours nagios still runs perfectly
> > with an average service check latency of 0.3 seconds, max.
> > 0.9 seconds.
> >
> > Memory usage is perfectly "flat" now, with epn and perlcache
> > it went from 140 mb (whole system) to about 900 mb within 24h.
> >
> > The average system load is a bit _lower_ than before, but some
> > peaks higher than with epn/perlcache.
> >
> > I'll try pure epn without perlcache first thing in january.

(pardon my butting in here) I'll do that, too.

> The main reason for me to use ePN with perlcache, is to get
> around the huge load of loading all the MIBs for each SNMP
> query.  (Since 90% of my services are SNMP queries.)  I was
> looking for a way to load the MIB tree once, and found I could
> do it in p1.pl.
> 
> For traps, I run snmptrapd (from net-snmp) and have just recently
> found it has a memory leak.  Over the course of 20 days, it grew
> from 5MB to 140MB.  It runs snmptthandler, which is actually a C
> program (I ported the Perl version to reduce the load during trap
> floods).
> 
> snmptt has a big memory leak.  I restart it every 6 hours.
> 
> This seems to be pointing to the net-snmp libraries.

I'm not using a single SNMP check, and I have the very same
problem: so I'd say no.

> Though, I don't get why it would really effect the nagios master
> process.  Since all the calls to the SNMP module are run in a
> subprocess, other than the initialization that I put into p1.pl.
> Unless p1.pl is executed more than once.
> 
> Back when I had about 200 service checks, my load was about 1.5.
> Then I enabled ePN with perlcache and stuck in the "use SNMP"
> with the preload of the MIBs.  Load went down to 0.3.  But, as
> I added services, most SNMP, this issue showed up.

I think the two issues are independent (or at most correlated).
If switching off EPN/perlcache fixes the issues for me, too, I'd
guess it's either the embedded Perl or the cache. Finding out
which is a matter of simple experimentation. I hope :)

Merry christmas to the lot of you, btw. 

Regards,
Tobias
(away from work and Nagios 'til January 8th)

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-21 Thread Tobias Klausmann
Hi! 

On Thu, 21 Dec 2006, Daniel Meyer wrote:
> > I have the suspicion that our check latency might converge on 419
> > seconds - but I'd rather not test it, we'd be well beyond the
> > 300s-interval most of our checks are designed for.
> 
> Why do you think of exactly 419 seconds?
> 
> And btw, if our problems are related the latency wont stop at that number 
> :)

Because that's the new average check latency as reported by -s.
Yes, I'm out on a limb there.

Regards,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-21 Thread Tobias Klausmann
Hi! 

On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> >>> SERVICE SCHEDULING INFORMATION
> >>> ---
> >>> Total services: 2836
> >>> Total scheduled services:   2836
> >>> Service inter-check delay method:   SMART
> >>> Average service check interval: 2225.56 sec
> >> This is, as you point out below, quite odd. What's your _longest_ 
> >> normal_check_interval for services?
> > 
> > The longest check_interval is 86400 seconds. It's a SSL cert
> > freshness check. I figured it wasn't necesseary to check that
> > more often than once a day. I also have check_intervals of 3, 5,
> > 15, 20, 30 and 1440 seconds. The latter is also a cert freshness
> > check which is lower because the customer wanted it to be that
> > short.
> > 
> 
> Try changing the really long intervals to something shorter or 
> commenting them out completely and see what happens. Checking a 
> certificate is not a particularly heavy operation so it doesn't matter 
> much if you run it ever 5 minutes. On the server side it just gets 
> handed out from cache, so it's not heave there either.

Actually, I was horribly wrong with that statement up there.

As it turned out, the check_interval was set to 86400. From that
I jumped to the conclusion "ah, one day" - familiar numbers do
that to you. But the base unit of check_interval isn't 1s, it's 1
minute. So the check_interval was 60 days. Fortunately, it was
only one such check which we quickly eliminated before producing
the second set of graphs I mentioned elsewhere in the thread.

Now, the longest check_interval truly is one day, 1440 minutes.
The average service check interval reported by -s is now 419
seconds. Still not terribly short, but it proves that the
86400-minute-monster was to blame for the 2200+ seconds.

Changing those once-a-day checks to 5 minutes is an option, but
I'd rather wait a little to give everybody on the list some time
to look at the graphs and come up with nifty ideas.

I have the suspicion that our check latency might converge on 419
seconds - but I'd rather not test it, we'd be well beyond the
300s-interval most of our checks are designed for. 

> > Oops, forgot to mention that. Yes, a server farm is being rebuilt
> > currently. As I didn't want all the host check timeouts to make
> > matters much, much, worse, I disabled them entirely.
> > 
> 
> Ah, that explains it then. It shouldn't matter, but unless the 
> experiment I suggested above turns up anything useful, would you mind 
> commenting them out and testing that?

I'll do that if removing the day-spaced-checks doesn't help.


Regards & Thanks,
Tobias
-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-21 Thread Tobias Klausmann
Hi! 

On Thu, 21 Dec 2006, Daniel Meyer wrote:
> - it is not triggered by any other software on the server
>(nagios and apache are the only things running there)

ACK.

> - its not triggered by hourly, daily or weekly cronjobs

With a lot of guessing and estimating, I can make a case for a
slight "plateau" right after the hour, with an increase in the
second half of the hour. Might be completely bogus, though.

> - the big service check latency goes away instantly after a restart
>of nagios

ACK.

> - the latency skyrockets after "some time", its not like "six hours
>after the restart" or something like that

Well, not so much as skyrocketing, steadily creeping up. See the
images I reference below.

> - service check execution time does NOT change at all, it stays on
>the same level all the time

NACK. For me, it starts out at some low-two-digit ms time, then
creeps up to 165.000ms (yes, exactly that value). As far as I can
tell, it stays there forever.

> - changing from a dummy host check to "adaptive" host checks back and
>forth doesn't make a difference

We didn't try that.

> - i see memory usage rise proportional to the latency, but there is
>way enough free memory left (this morning it was 150 seconds latency
>but still 790 Megs free ram, plus one gig cached)

Same (with slightly different figures) here.

> - load on the system rises a little but not much

It's measurable, but definitely not maxed out. Same goes for CPU
utilization (which is something different)>

> - network usage goes down (well there are less checks done due to the
>latency, so no surprise here)

We haven't checked that but as network traffic (both volume and
packet rate) wasn't near any limit, we didn't feel it was
necessary.

Here are a few graphs we created for yesterday and the day before
that:

http://eric.schwarzvogel.de/~klausman/nagios-perf-1/

and here are the pics of today and yesterday afternoon:

http://eric.schwarzvogel.de/~klausman/nagios-perf-2/

For all graphs, check frequency was every 2 minutes. For the
older set, a SNAFU on my part when setting up the RRDs resulted
in reduced resolution. That was fixed with the second set.

"Queue size" is calculated the following way: look at all objects
in the state file (retention.dat, saved every 20s). Every object
with a check time in the past counts as one queue entry.

"Slots"/"Checks completed" is a what nagiostats reports as # of
checks completed in the four timeframes.

Things I noted:

Queue size oscillates wildly. This might be due to my
methodology. Still, one can read a trend from that curve.

Check execution time converges at 106ms. On the spot. I have no
idea why.

Load average and CPU idleness indicate that we don't have a host
performance problem (I also looked over but did not plot stuff
like interrupt rate and context switches, nothing overly high,
there).

For the older graphs, check latency doesn't budge at all for
some time (or it's too little to see it). For the newer graph,
the initial rise is rather steep, then increase slows down a bit.
Still, over the course of hours, it seems linear and shows no
sign of converging.

If anybody is interested in the RRD files used to generate the
graphs, drop me a line.

The picture all of this paints is rather inconclusive. We've
found an oddity in our config I'll relate in another mail (a
check interval of 86400 minutes, that's two months). We have
eliminated that for the newer graphs, however.

In conclusion, I'm at a loss as to why this slow deterioration of
check performance happens. 

A colleague of mine is looking at the Nagios scheduling code (he
thinks the description of the algorithm in the docs is rather
strange). He hasn't reported back yet, though.

All in all, every hint is appreciated.

Regards,
Tobias




-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-19 Thread Tobias Klausmann
Hi! 

On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> >>> ---
> >>> Total services: 2836
> >>> Total scheduled services:   2836
> >>> Service inter-check delay method:   SMART
> >>> Average service check interval: 2225.56 sec
> >> This is, as you point out below, quite odd. What's your _longest_ 
> >> normal_check_interval for services?
> > 
> > The longest check_interval is 86400 seconds. It's a SSL cert
> > freshness check. I figured it wasn't necesseary to check that
> > more often than once a day. I also have check_intervals of 3, 5,
> > 15, 20, 30 and 1440 seconds. The latter is also a cert freshness
> > check which is lower because the customer wanted it to be that
> > short.
> 
> Try changing the really long intervals to something shorter or 
> commenting them out completely and see what happens. Checking a 
> certificate is not a particularly heavy operation so it doesn't matter 
> much if you run it ever 5 minutes. On the server side it just gets 
> handed out from cache, so it's not heave there either.
> 
> If you have the various normal_check_interval's specified in templates, 
> try setting them all to 5 minutes and let Nagios run over-night. If this 
> interferes with some fragile services on the network (webservers whose 
> sessions don't expire, fe), disable active checks for those services 
> during the testing period.
> 
> (yes, this might seem braindead, but I really need to know if this bug 
> is still in Nagios).

I'll do that this afternoon, I'd just like to wait a little more
regarding the changes my kernel/cpu-update brings (or doesn't).

> >>> *Or* it is indicative of a misconfiguration on my
> >>> part. If the latter is the case, I'd be eager, nay ecstatic to
> >>> hear what I did wrong. Here are a few of the config vars that
> >>> might influence this:
> >> There has been a slight thinko in Nagios. I don't know if it's still 
> >> there in recent CVS versions. The thinko is that it (used to?) calculate 
> >> average service check interval by adding up all normal_check_interval 
> >> values and dividing it by the number of services configured (or 
> >> something along those lines), which leads to long latencies. This 
> >> normally didn't make those latencies increase though. Humm...
> > 
> > Well, the numbers sure do get whacky after a restart: first it
> > skyrockets for about five minutes, then plummets to 1s. From
> > there it works its way up the way I described.
> 
> Are the first checks of things being scheduled with unreasonably long 
> delays? Fe, a check with 3 minute normal_check_interval being scheduled 
> an hour or so into the future.

Usually, yes. As I use state retention, I don't believe in the
initial numbers all that much. After about 5-10 minutes one can
usually make out a trend. Not this time, though. Here's hoping
that it keeps oscillating around the 8-9 seconds I currently.

> >>> Total Services:   2836
> >>> Services Checked: 2836
> >>> Services Scheduled:   2758
> >>> Active Service Checks:2836
> >>> Passive Service Checks:   0
> >> All services aren't being scheduled, but you have no passive service 
> >> checks. Have you disabled checks of 78 services?
> > 
> > Oops, forgot to mention that. Yes, a server farm is being rebuilt
> > currently. As I didn't want all the host check timeouts to make
> > matters much, much, worse, I disabled them entirely.
> 
> Ah, that explains it then. It shouldn't matter, but unless the 
> experiment I suggested above turns up anything useful, would you mind 
> commenting them out and testing that?

I was planning to do that tomorrow for the very same reasons.

> >>> Hardware is a dual-2.8GHz Xeon, 2G RAM and a 100 FDX interface.
> >>> LoadAvg is around 1.6, sometimes gets to 1.9. CPUs are both
> >>> around 40% idle most of the time. I see about 300 context
> >>> switches and 500 interrupts per second. The network load is
> >>> neglible, ditto the packet rate.
> >>>
> >>> The way these figures look I don't see a performance problem per
> >>> se, but maybe I have overlooked a metric that descirbes the
> >>> "usual" bottleneck of installations.
> >>>
> >> Are the CPU's 64 bit ones running in 32-bit emulation mode? For intel 
> >> cpu's, that causes up to 60% performance loss (yes, it really is that bad).
> > 
> > Sheesh. Yes, it is a 32-bit installation. I only ever bothered
> > with 64-bit installs on Opteron hardware. I might look into
> > migrating to 64 bits, then.
> > 
> 
> So the CPU's are 64-bits? Humm... 64-bit mode would boost available 
> resources quite a bit, but as you just enabled HT you should now have 3 
> extra CPU's (Xeon's are dualcore AFAIR) which will probably set you safe 
> for a while.

Colleague just told me that this particular batch wasn't
available in 64 bits. So no, they're 32bits, well one thing to
test out of the way :-/

> >> I'm puzzled. Please let me know

Re: [Nagios-users] Performance issues, too

2006-12-19 Thread Tobias Klausmann
Hi! 

On Tue, 19 Dec 2006, Daniel Meyer wrote:
> >> You could lower this to 2 seconds. I've done so on any number of
> >> installations and it has no negative impact what so ever, but seems to
> >> make Nagios a bit more responsive.
> >
> > I'll give that a try.
> 
> I've tried that but had some failing checks when i did that. Very 
> strange...

I'm still waiting how the kernel change will work out.

> > I also noticed that HT was disabled on the machine. I've changed
> > that (and added support for it to the kernel) when I did the
> > kernel upgrade today. I'll keep an eye on check latency.
> 
> I have HT enabled, no effect on the nagios latency problems.

I've now setup a little script that puts host and service check
latency in an RRD file every five minutes. So far, the curve
looks very inconclusive.

Regards,
Tobias
-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Questions about scheduling

2006-12-19 Thread Tobias Klausmann
Hi! 

On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> > - How does the scheduling queue work? From the docs it seems the
> >   whole queue is held up as soon as a host check is necessary. 
> >   As far as I know, Nagios parallelizes checks, so my question
> >   is if the current checking thread is held up only or if all of
> >   the checks are stopped immediately?
> 
> All the checks are stopped immediately. This is to prevent sending 
> service notifications when hosts go down, so it's sort of inevitable. 
> Nagios 3.0 will support asynchronous host checks, but we're not there yet.

Good, good, at least there's a solution on the horizon.

> > - If the whole set of workers is stopped, this would mean that a
> >   failing check would result in an immeidate host check which in
> >   turn holds up all the queues until it is complete. Does it
> >   really work this way?
> 
> Yes, for reasons stated above. It gets slightly worse if you have a 
> largely linear network (many hosts only have one child), since it also 
> has to check parent hosts until it finds the "closest" possible "up" to 
> determine where a possible network outage is happening.

I have nearly no parent/child relationships in my setup, as I
don't monitor network equipment just hosts (and their services)
As such, I only have service deps (more deps than services,
actually).

> > - How much performance overhead do service dependencies generate?
> >   I have quite a few NRPE checks and all of them depend on the
> >   NRPE dummy check I always define. Does this stall checking by
> >   any considerable amount?
> 
> Not much, and not really, respectively :). You might want to enable 
> "soft_service_dependencies" though. The dependencies only add a couple 
> of internal checks along the lines of
> 
> if (service->state & dependent_state->notification_failure_criteria)
>   /* don't send a notification */
>   ;

Ok, I'll refrain from dropping them, then :)

Thanks,
Tobias

PS: Sorry for hijacking the thread, I missed deleting the
in-reply-to-line. Thinking how easy that mistake was, I'll
probably be a little less annoyed next time I see someone do it
:)

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Performance issues, too

2006-12-19 Thread Tobias Klausmann
Hi! 

On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> Thanks for an excellently detailed problem report, missing only the 
> Nagios version and system type/version info. I've got some comments and 
> followup questions. See below.

I'm running 2.6 now but I had the troubles with 2.5 initially.
OS is a Gentoo Linux, Kernel 2.6.15.5 initially, upgrade to
2.6.19 today.

> > ---
> > Total hosts: 330
> > Total scheduled hosts:   0
> 
> No scheduled host-checks. That's good, cause they interfere with normal 
> operations in Nagios.

I've read as much. In my seperate mail I had a few questions
about it, let's keep them (and the answers there ;)

> > Host inter-check delay method:   SMART
> > Average host check interval: 0.00 sec
> > Host inter-check delay:  0.00 sec
> > Max host check spread:   10 min
> > First scheduled check:   N/A
> > Last scheduled check:N/A
> > 
> > 
> > SERVICE SCHEDULING INFORMATION
> > ---
> > Total services: 2836
> > Total scheduled services:   2836
> > Service inter-check delay method:   SMART
> > Average service check interval: 2225.56 sec
> 
> This is, as you point out below, quite odd. What's your _longest_ 
> normal_check_interval for services?

The longest check_interval is 86400 seconds. It's a SSL cert
freshness check. I figured it wasn't necesseary to check that
more often than once a day. I also have check_intervals of 3, 5,
15, 20, 30 and 1440 seconds. The latter is also a cert freshness
check which is lower because the customer wanted it to be that
short.

> > CHECK PROCESSING INFORMATION
> > 
> > Service check reaper interval:  5 sec
> 
> You could lower this to 2 seconds. I've done so on any number of 
> installations and it has no negative impact what so ever, but seems to 
> make Nagios a bit more responsive.

I'll give that a try.

> > Max concurrent service checks:  Unlimited
> 
> I assume you aren't running in to hardware limits on this machine. 
> What's the normal load when you're running nagios? If it's > NUM_CPUS 
> then you most likely don't have beefy enough hardware. That's hardly 
> ever the case though, so don't bother looking into it unless all else fails.
> 
> Nvm, question answered below. Hardware resources should be no problem 
> what so ever.

I also noticed that HT was disabled on the machine. I've changed
that (and added support for it to the kernel) when I did the
kernel upgrade today. I'll keep an eye on check latency.

> > *Or* it is indicative of a misconfiguration on my
> > part. If the latter is the case, I'd be eager, nay ecstatic to
> > hear what I did wrong. Here are a few of the config vars that
> > might influence this:
> 
> There has been a slight thinko in Nagios. I don't know if it's still 
> there in recent CVS versions. The thinko is that it (used to?) calculate 
> average service check interval by adding up all normal_check_interval 
> values and dividing it by the number of services configured (or 
> something along those lines), which leads to long latencies. This 
> normally didn't make those latencies increase though. Humm...

Well, the numbers sure do get whacky after a restart: first it
skyrockets for about five minutes, then plummets to 1s. From
there it works its way up the way I described.

> > Total Services:   2836
> > Services Checked: 2836
> > Services Scheduled:   2758
> > Active Service Checks:2836
> > Passive Service Checks:   0
> 
> All services aren't being scheduled, but you have no passive service 
> checks. Have you disabled checks of 78 services?

Oops, forgot to mention that. Yes, a server farm is being rebuilt
currently. As I didn't want all the host check timeouts to make
matters much, much, worse, I disabled them entirely.

> > Hardware is a dual-2.8GHz Xeon, 2G RAM and a 100 FDX interface.
> > LoadAvg is around 1.6, sometimes gets to 1.9. CPUs are both
> > around 40% idle most of the time. I see about 300 context
> > switches and 500 interrupts per second. The network load is
> > neglible, ditto the packet rate.
> > 
> > The way these figures look I don't see a performance problem per
> > se, but maybe I have overlooked a metric that descirbes the
> > "usual" bottleneck of installations.
> > 
> 
> Are the CPU's 64 bit ones running in 32-bit emulation mode? For intel 
> cpu's, that causes up to 60% performance loss (yes, it really is that bad).

Sheesh. Yes, it is a 32-bit installation. I only ever bothered
with 64-bit installs on Opteron hardware. I might look into
migrating to 64 bits, then.

> I'm puzzled. Please let me know if you find the answer to this problem. 
> I'll help you debug it as best I can, but please continue posting 
> on-list. Thanks.

Sure. I'll first check if the "processor upgrade" and kernel
update helped anything, then t

[Nagios-users] Questions about scheduling

2006-12-19 Thread Tobias Klausmann
Hi! 

I have a few questions about scheduling in Nagios.

- How does the scheduling queue work? From the docs it seems the
  whole queue is held up as soon as a host check is necessary. 
  As far as I know, Nagios parallelizes checks, so my question
  is if the current checking thread is held up only or if all of
  the checks are stopped immediately?

- If the whole set of workers is stopped, this would mean that a
  failing check would result in an immeidate host check which in
  turn holds up all the queues until it is complete. Does it
  really work this way?

- How much performance overhead do service dependencies generate?
  I have quite a few NRPE checks and all of them depend on the
  NRPE dummy check I always define. Does this stall checking by
  any considerable amount?

Thansk for any help,
Tobias


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Performance issues, too

2006-12-19 Thread Tobias Klausmann
Hi! 

Recently I have run into the very same performance issues 
as Daniel Meyer (or so it seems). However, I'm not quite sure
about it. Here's the gist of it.

Currently, service check latency slowly creeps up. As it is now,
it starts out at a little over 1s and after about 12 hours it's
in the area of about 90s. It keeps climbing after that. 

Here's the output of nagios -s:
Nagios 2.6
Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org)
Last Modified: 11-27-2006
License: GPL

Warning: Contact group 'Singles-Truppe' is not used in any
host/service definitions or host/service escalations!
Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---
Total hosts: 330
Total scheduled hosts:   0
Host inter-check delay method:   SMART
Average host check interval: 0.00 sec
Host inter-check delay:  0.00 sec
Max host check spread:   10 min
First scheduled check:   N/A
Last scheduled check:N/A


SERVICE SCHEDULING INFORMATION
---
Total services: 2836
Total scheduled services:   2836
Service inter-check delay method:   SMART
Average service check interval: 2225.56 sec
Inter-check delay:  0.21 sec
Interleave factor method:   SMART
Average services per host:  8.59
Service interleave factor:  9
Max service check spread:   10 min
First scheduled check:  Tue Dec 19 11:21:45 2006
Last scheduled check:   Tue Dec 19 11:31:47 2006


CHECK PROCESSING INFORMATION

Service check reaper interval:  5 sec
Max concurrent service checks:  Unlimited


PERFORMANCE SUGGESTIONS
---
I have no suggestions - things look okay.

This all looks peachy - I think. What I don't get is this line:

Average service check interval: 2225.56 sec

It seems to me that this is either a skewed value, stemming from
my history of looong latencies (at one point we were beyonf
9000 seconds). *Or* it is indicative of a misconfiguration on my
part. If the latter is the case, I'd be eager, nay ecstatic to
hear what I did wrong. Here are a few of the config vars that
might influence this:

sleep_time=0.25
service_reaper_frequency=5
max_concurrent_checks=0
max_host_check_spread=10
host_inter_check_delay_method=s
service_interleave_factor=s
command_check_interval=1
obsess_over_services=0
aggregate_status_updates=1
status_update_interval=20

Also, here's the output from nagiostats:
Nagios Stats 2.6
Copyright (c) 2003-2005 Ethan Galstad (www.nagios.org)
Last Modified: 11-27-2006
License: GPL

CURRENT STATUS DATA

Status File:  /var/nagios/status.dat
Status File Age:  0d 0h 0m 3s
Status File Version:  2.6

Program Running Time: 0d 1h 59m 5s

Total Services:   2836
Services Checked: 2836
Services Scheduled:   2758
Active Service Checks:2836
Passive Service Checks:   0
Total Service State Change:   0.000 / 12.370 / 0.007 %
Active Service Latency:   0.006 / 10.237 / 0.906 sec
Active Service Execution Time:0.047 / 10.159 / 0.180 sec
Active Service State Change:  0.000 / 12.370 / 0.007 %
Active Services Last 1/5/15/60 min:   477 / 2678 / 2745 / 2754
Passive Service State Change: 0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:  0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:2814 / 6 / 0 / 16
Services Flapping:0
Services In Downtime: 0

Total Hosts:  330
Hosts Checked:330
Hosts Scheduled:  0
Active Host Checks:   330
Passive Host Checks:  0
Total Host State Change:  0.000 / 0.000 / 0.000 %
Active Host Latency:  0.000 / 1.000 / 0.888 sec
Active Host Execution Time:   0.030 / 4.059 / 0.112 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:  0 / 12 / 12 / 12
Passive Host State Change:0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach:329 / 1 / 0
Hosts Flapping:   0
Hosts In Downtime:0

Hardware is a dual-2.8GHz Xeon, 2G RAM and a 100 FDX interface.
LoadAvg is around 1.6, sometimes gets to 1.9. CPUs are both
around 40% idle most of the time. I see about 300 context
switches and 500 interrupts per second. The network load is
neglible, ditto the packet rate.

The way these figures look I don't see a performance problem per
se, but maybe I have

Re: [Nagios-users] Advanced permissions/user properties

2006-12-07 Thread Tobias Klausmann
Hi! 

On Tue, 05 Dec 2006, Tobias Klausmann wrote:
> Thus, I'll probably patch NG to just ignore the perms. 
> 
> I'll post the patch here (if it's not too ugly ;))

See the attached file. Have fun.

Regards,
Tobias

-- 
Never touch a burning system.
--- NagiosGrapher.pm.orig   2006-12-07 15:15:40.0 +0100
+++ NagiosGrapher.pm2006-12-07 15:15:54.0 +0100
@@ -181,12 +181,15 @@
 sub AuthCheck {
my $rw=0;
my ($cfg,$host,$user)[EMAIL PROTECTED];
+   my $cgroups="";
 
 read_nagios_cfg($cfg) if(!$nagios_loaded_cfg);
 
 if($nagiosversion==2) {
# Nagios Version 2.? Perform test
-   $rw=1 
if(parse_nagios_cfg('contactgroup',parse_nagios_cfg('host',$host,'contact_groups'),'members')
 =~ m/$user(,|$)/);
+   $cgroups = parse_nagios_cfg('host',$host,'contact_groups');
+   $cgroups =~ s/:[^,]*//g;
+   $rw=1 if(parse_nagios_cfg('contactgroup',$cgroups,'members') =~ 
m/$user(,|$)/);
 } else {
# Nagios Version 1.X isn't supported yet so everything is allowd 
$rw=1;
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Advanced permissions/user properties

2006-12-05 Thread Tobias Klausmann
Hi! 

On Mon, 06 Nov 2006, Tobias Klausmann wrote:
> > For backwards compatibility, the default would be rwxn.
> > 
> > So, the engineers would have: nrx, customer: nr and helpdesk r.
> > 
> > Attached is an updated patch.
> 
> I'll try to get a peek at it this week.

Well, it took a little longer due to other things acting up
elsewhere.

As far as I can tell, it works perfectly - except...

We use NagiosGrapher to turn the perfdata into nice little colorful
thingies for the app developers. Unfortunately, NG seems to not
work well with the patch: It thinks "foo:r" is the group of the
same name, not the group "foo".

This will probably be easy to fix in a generic way (i.e. drop
everything after the last ":") and a little more complicated if
the perms should be parsed (though I have no idea what perms
would be needed for the graph: n or x only would prevent the user
to see the host/service (and thus the extinfo link) in the first
place. And preventing "r" users from seeing the graphs doesn't
make much sense, IMO).

Thus, I'll probably patch NG to just ignore the perms. 

I'll post the patch here (if it's not too ugly ;))

> Thanks, again (all of you).

I can only re-iterate that.

Regards,
Tobias

-- 
I love the smell of burning bridges

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] How do distributed setups work? (longish)

2006-11-23 Thread Tobias Klausmann
Hi! 

First off, thanks for your quick reply.

On Wed, 22 Nov 2006, Patrick Morris wrote:
> > 1) Documentation for NSCA is - mildly put - lacking. As far
> > as I can tell, send-NSCA expects data tab-separated on stdin.
> > It would've been nice to actually see an example for getting
> > host and service data into it. Am I supposed to do something
> > like "printf $X$\t$Y$\t$Z$|send_nsca -H ..." for the OCSP
> > command?
> 
> There are examples in the default distribution that show how to
> do this.  Take a look in (on my platform, at least)
> /usr/lib/nagios/plugins/eventhandlers/submit_nsca_result.  Your
> location and filename may vary.

Ah, with a little bit of searching (slocate be praised) I dug it
up (it's named /submit_check_result_via_nsca here). Looks like
it's what I would've come up with :)

> > 2) How does the information that a check should be disabled
> > get from the central machine to the checkers? I've found no
> > "usual" way of doing it?  Would it be necessary to setup some
> > distribution via SSH to the checkers?
> 
> There are several different ways you could handle this, but
> you'd need to find *some* way to get the information out to the
> machines doing the checking.  For my prurposes, just turning
> off notifications has been enough, and that's something you can
> do at the central box only if that's where your notifications
> come from.

Of course. Disabling checks entirely surely is a seldom
occurrence. I guess it only matters if the checks break something
on the checked machine - or they cause congestion for some
reason. As I was planning on handling notification centrally,
it's enough for me.

> > 3) All machines setup to be check passively (i.e. by a
> > checker) are displayed as "disabled" in the web front end.
> > This is very counter-intuitive (they *are* checked, after
> > all). 
> 
> Yes, that's how they look when active checks are disabled.

Which in turn will confuse the hell out of my users. Is there no
way around this? I thought of replacing all my check_commands on
the central machine with /bin/true, but that would result in a
lot of flapping, so it's no solution. 

I guess the only way would be to hack the CGIs. Oh well.

> > 4) There would have to be some mechanism of config
> > distribution.  Both the central machine and the checker need
> > to agree on which services there are. Otherwise, some checks
> > would never be executed or the central machine would ignore
> > the submitted results.
> 
> People do this different ways, but I push the same config to
> all my boxes (keeps synching easy), and have several service
> templates (for example "location_a_service,
> location_b_service,i" etc.) defined on each box.  At location
> A, a "location_a_service" is defined as active.  At location b,
> it's defined as disabled.  At the central box, it's defined as
> passive.
> 
> This allows me to keep per-box-configuration differences
> minimal, and still allows configs to be kept in sync with
> minimal effort (in my case, I rsync the whole damn thing over
> to every machine, except the one small per-server config file).

I was thinking along similar lines, but having a "template
config" which gets filtered into the different configs for
checkers and central machine. How I'd exactly solve that I don't
know yet. But your approach sounds like an interesting
alternative.

Thanks again,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] How do distributed setups work? (longish)

2006-11-22 Thread Tobias Klausmann
Hi all,

I'm having a conceptual/logical/mindset problem which I hope you
can help me with. It's a bit long, but the question/problem I
have is complex, so please bear with me.

What I dream of:

I have a central machine which is the interface to the users.
Using the/a web interface, the users can do exactly the same
stuff they can do with a single-host installation: acknowledge
problems, schedule downtimes, disable checks etc. It's the
CGI-side of current affairs, so to speak.

However, there are also N dedicated checking machines (I call
them "checkers") which work the same way as the Nagios core does
(i.e.  without the CGIs). There's no Apache running and none of
the Users really know about them (except that they had to poke
holes for them into their firewalls). These machines ideally only
do automatic scheduling of checks and execute the checks
themselves. As for the return values, info strings and perfdata
returned by the plugins, they simply pass them on to the central
machine described above.

This way, I can scale the entire setup if the/a checking machine
runs out of CPU/memory when scheduling checks. Also, I can build
dedicated checkers inside DMZs and the like.

As for notification, this could possibly be done by the checkers
directly, but then, acknowledgments and disabled notifications
(which are entered centrally) would have to find their way to the
checkers. I think handling notification centrally would be
better. Even if the central machine is overloaded with
notifications, it could be delegated to a dedicated machine that
is used as a smart host.

As far as the marketing goes ;) I had the impression that Nagios
and friends can do this kind of setup. However when I tried to
set up something like this, I ran into numerous problems.

1) Documentation for NSCA is - mildly put - lacking. As far as I
can tell, send-NSCA expects data tab-separated on stdin. It
would've been nice to actually see an example for getting host
and service data into it. Am I supposed to do something like
"printf $X$\t$Y$\t$Z$|send_nsca -H ..." for the OCSP command?

2) How does the information that a check should be disabled get
from the central machine to the checkers? I've found no "usual"
way of doing it?  Would it be necessary to setup some
distribution via SSH to the checkers?

3) All machines setup to be check passively (i.e. by a checker)
are displayed as "disabled" in the web front end. This is very
counter-intuitive (they *are* checked, after all). 

4) There would have to be some mechanism of config distribution.
Both the central machine and the checker need to agree on which
services there are. Otherwise, some checks would never be
executed or the central machine would ignore the submitted
results.

The only solution I have thought of so far which *might* work is
running NRPEs on the checkers which get used by the central
machine. This would mean that the checkers only have an NRPE and
the Nagios plugins.  For host internal checks, I'd have an "NRPE
cascade" or NRPE using check_snmp. This has the downside that the
central machine might run into congestion problems when
scheduling.

Another "solution" would be to have multiple completely Nagios
installations for different (sets of) projects. I'm very wary of
this.  I'm part of the team that is responsible for the whole
enchilada, i.e.  we need to have monitoring access to all of
those projects.  Having to log into N web front ends for a
"quick" overview is not really an option.  One might be able to
work with reverse proxying and/or custom-tailored CGIs here, but
I'd rather not.

So my question to the "big boys" out there: how exactly is a
distributed setup *supposed* to work?

Thanks for your time!

Regards,
Tobias

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_cciss plugin for monitoring RAID arrays on HP servers

2006-11-17 Thread Tobias Klausmann
Hi! 

On Fri, 17 Nov 2006, Yogesh Hasabnis wrote:
> Yes, I had used arrayprobe form the command-line. But being a layman, I was
> not sure how to define a checkcommand using arrayprobe. Anyway, I will also
> give it a try.

We use this:

nrpe.conf:
command[check_array]=/usr/bin/sudo /usr/bin/arrayprobe

Note that sudo is needed since NRPE has its own user and teh
utility nees to run as root.

nagios config:
define service {
host_name 
service_description   check_array
check_command check_nrpe!check_array
max_check_attempts3
normal_check_interval 5
retry_check_interval  1
check_period  24x7
notification_interval 300
notification_period   24x7
notification_options  w,c,r
contact_groups
}

Hope this helps,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_cciss plugin for monitoring RAID arrays on HP servers

2006-11-17 Thread Tobias Klausmann
Hi! 

On Fri, 17 Nov 2006, Sim wrote:
> > It works for both IDA and CCISS devices and already returns the
> > retvals the way Nagios wants them.
> 
> Hi!
> 
> Can you post output an example of this ?
> 
> Have you tryed all case? ( Rebuilding, Inter.Recovery, Fail, etc.. )

Ok state looks like this:
# arrayprobe   
OK Arrayprobe All controllers ok
# echo $?
0

So far I've only seen Interim Recovery Mode which is a warning.
One could argue about that I guess, I found it to be ok for my
use case.

Failure is critical, which is definitely what I'd expect.

What I like most about it is its simpleness and yet it's useful
outside of Nagios, too. getting at the log of the controller is
nice, too:

# arrayprobe -r
Retrieving logical drive information from controller
/dev/cciss/c0d0
Number of logical volumes (00 00 00 08) : 1
Controller /dev/cciss/c0d0 reports 1 logical drives
Logical drive 0 found on controller /dev/cciss/c0d0
Event code 5/0/0
at 8-26-2004 01:37:22
with message: State change, logical drive 0
logical drive 0, changed from state 2 to 0
state 2: Logical drive is not configured
state 0: Logical drive is ok

Event code 5/2/0
at 9-5-2005 11:09:26
with message: Parity/consistency initialization complete, logical
drive 0

Event code 4/0/0
at 11-14-2006 17:55:33
with message: Physical drive failure: SCSI port 1 ID 1
physical drive 1 has failed with failurecode 5.
this drive is part of a logical drive

Event code 5/0/0
at 11-14-2006 17:55:33
with message: State change, logical drive 0
logical drive 0, changed from state 0 to 3
state 0: Logical drive is ok
state 3: Logical drive is using interim recovery mode

Event code 1/0/0
at 11-17-2006 13:01:33
with message: Hot-plug drive removed: SCSI port 1 ID 1

Event code 1/0/1
at 11-17-2006 13:01:52
with message: Hot-plug drive inserted: SCSI port 1 ID 1

Event code 5/0/0
at 11-17-2006 13:01:52
with message: State change, logical drive 0
logical drive 0, changed from state 3 to 4
state 3: Logical drive is using interim recovery mode
state 4: Logical drive is ready for recovery operation

Event code 5/0/0
at 11-17-2006 13:01:52
with message: State change, logical drive 0
logical drive 0, changed from state 4 to 5
state 4: Logical drive is ready for recovery operation
state 5: Logical drive is is currently recovering

Event code 5/0/0
at 11-17-2006 13:52:29
with message: State change, logical drive 0
logical drive 0, changed from state 5 to 0
state 5: Logical drive is is currently recovering
state 0: Logical drive is ok

Event code 0/0/0
with message: No events to report.

failed to open device /dev/ida/c0d0: No such device or address
Logical drive 0 on controller /dev/cciss/c0d0 has state 0
OK Arrayprobe All controllers ok

Regards,
Tobias
> 
> Thanks
> 
> -
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ___
> Nagios-users mailing list
> Nagios-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting 
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_cciss plugin for monitoring RAID arrays on HP servers

2006-11-17 Thread Tobias Klausmann
Hi! 

On Fri, 17 Nov 2006, Thomas Hager wrote:
> > ./check_cciss-1.5: line 90: ./utils.sh: No such file or directory
> the plugin calls utils.sh (which comes with the nagios-plugins package)
> and needs it in the same directory you installed check_cciss. so, you
> got three joices:
> 
> a) copy check_cciss-1.5 to the directory where all the nagios plugins
> are installed (including utils.sh)
> b) copy utils.sh to the directory you're testing check_cciss-1.5
> c) edit the check_cciss script and adjust the path to utils.sh

The other day I saw another tool which is completely OSS (GPL-2,
IIRC) and independent of hpacucli. It works just fine for me and
it's even included in our fave distro (Debian).

It works for both IDA and CCISS devices and already returns the
retvals the way Nagios wants them.

http://www.strocamp.net/opensource/

The tool is called "arrayprobe"

HTH,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Advanced permissions/user properties

2006-11-06 Thread Tobias Klausmann
Hi! 

First off: thanks for all your work, it didn't quite expect so
much (and such constructive/worthwile) feedback.

On Sun, 05 Nov 2006, Alex Burger wrote:
> How about:
> 
> r: View in web interface
> 
> x: Submit commands for this host/service
> 
> w: Not really needed yet.  Maybe some of the other programs that allow 
> you to modify the configuration files could use w to allow a user to 
> modify the host / service.
> 
> n: Notify if contact has a pager or email defined

I think one could make a case for x being everything that
concerns the current state of an object, i.e. mainly
acknowledgement(s). The w flag could be used for en/disabling
(semi)permanent stuff, like disabling active checks. 

On the other hand, many actions (like schedule downtime) would
fall into a grey area, so maybe using x for all of them and
"keeping" w for later is better.

> I also changed it so that you will only see a service if you are a 
> contact for it.  I think this is the same change that Ton mentioned in 
> his last email.  I did this to test the 'r' permission.

This was the default in our installation (by way of not having an
asterisk in the corresponding line(s) in the main config file.

> 
> For backwards compatibility, the default would be rwxn.
> 
> So, the engineers would have: nrx, customer: nr and helpdesk r.
> 
> Attached is an updated patch.

I'll try to get a peek at it this week.

Thanks, again (all of you).

Regards,
Tobias


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Advanced permissions/user properties

2006-11-03 Thread Tobias Klausmann
Hi! 

On Thu, 02 Nov 2006, Alex Burger wrote:
> I have expanded on the Altinity patch by adding a 'can_submit_commands' 
> and 'can_submit_commands_strict' option to contact groups.  The 
> limitation of having a can_submit_commands option on the user is that 
> it's an all or nothing option.  A user is either view-only for all 
> devices, or not.
> 
> I will be using can_submit_commands_strict for people who need to be 
> able to submit commands for the servers and services they manage, but 
> also be able to only view some other servers and devices.  I don't want 
> the users to be able to view ALL devices.
> 
> *can_submit_commands_strict:*  You grant users full access to all or 
> some systems, but want to restrict them from issuing commands for a few 
> devices.
> 
> If a device has multiple contact groups defined and any one of them 
> denies submit commands with can_submit_commands_strict 0, then the user 
> is denied even if the user belongs to a group that permits it.
> 
> *can_submit_commands:*  You grant users read/only access to all systems, 
> but you want to allow the user to issue commands for a few devices.
> 
> With can_submit_commands, if a device has multiple contact groups 
> defined and any one of them allows submit commands, the user can submit 
> commands.  If there was only one contact group listed and it had 
> can_submit_commands set to 0, the user would not be able to submit commands.
> 
> Is this what you are looking for?

I'm not quite sure :)

Actually I'm not sure I understand the functionality you added
correctly. I'll explain what I think I've understood:

The new attribute (..._strict) belongs to contact_groups. If it
is set to 1 on a contactgroup, everything behaves as normal.

If it is set to 0, then no user who's associated with a
hostgroup that is also associated with this contactgroup may
issue commands for that particular host(group).

As this sounds more than counter-intuitive, I strongly suspect
I've misunderstood something.

Please enlighten me. :)

Regards,
Tobias


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Advanced permissions/user properties

2006-10-31 Thread Tobias Klausmann
Hi! 

On Tue, 31 Oct 2006, Az wrote:
> > The altinity people have created a patch for the "view some,
> > change none" scenario[0]. Unfortunately, what I'd need is a
> > mechanism for the "view some, change a few" scenario I outlined
> > above.
> Is that to say that "view _all_, change some" wouldn't work for you? 
> That's how we're working at present, out-of-the-box. While restricting 
> viewing might reduce mental clutter for those concerned*, I can't see 
> why being able to see everything is a big deal (unless you're displaying 
> some super sensitive information in Nagios which is a totally different 
> topic).

Well it's not super sensitive but when we started deploying
Nagios we were very happy to not clutter the webpages for everybody
(how we Nagios admins cope is another story ;)). We're currently
looking into hacking cmd.cgi to 

a) log all issued commands with username and ip
and 
b) do some permission checking

Unfortunately b) will leave us with yet another othogonal user
account type. Currently, most users have three accounts:
first.last, first.last-sms, first.last-email, first.last-phone.
The first account allows them to log onto the website and view
stuff, the other three are used to configure the respective
notification types (the first account has notifications disabled
entirely). This has the advantage that not everybody that wants
to see a host has to get any of the used notification types.
Unfortunately, you can not easily pull apart "view" and "disable
stuff" this way, hence my initial question.

The Nagios permission system is a bit lacking in this respect. As
far as I can tell from Ethan's presentation about 3.0, that won't
change (much) with the next version. It's not like I have the
perfect way to specify such fine-grained perms in my head,
either. 

> *Our service centre can see everything, but we used service groups to 
> provide "rolled up" views to keep it simple for them. All our engineers 
> can see everything so when they spot an issue with someone elses kits 
> that impacts their own, they can find out what the issue is and who to 
> poke in the eye with a burnt stick to get it fixed. We found that trying 
> to hide stuff just blinkered people and made them only responsive to 
> their own issues ("not my server. not my problem.") which results in 
> poor customer service at the end of the day.

While I agree with you mostly, we also offer Nagios monitoring to
our customers. This is turn means that we have to seperate them
in some way (it wouldn't be cool if all customers saw each
other's hosts and services). This is a hard requirement that I
can't move a single inch on (rather: my boss won't let me). 

We're now looking at seperating our own Nagios and that for the
customers. That way, we'd get the "have N accounts for everybody"
to be a little more manageable. For our internal stuff, we'd go
the route you described (everybody seeing everything). Seeing as
about 90% of what we monitor is our own stuff, that would make
quite a difference.

Regards,
Tobias

-- 
Never touch a burning system.

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Advanced permissions/user properties

2006-10-31 Thread Tobias Klausmann
Hi!

I've got a problem that I don't how to solve best in Nagios. I
think other people have run into the same problem (I know that
someone has run into a /similar/ problem).

I'm running 2.5 on a mid-sized installations (~300 hosts, ~2500
services). Thing is, our projects/(host|service)groups vary
wildly in who is responsible for them. Unfortunately, all these
projects are also heavily intertwined in their dependencies.

Say we have a web mail project A. This consists of several
web servers, databases and the like. It is heavily dependent on
the LDAP project B and the mail server project C. While B and C
have the same group of admins, project A is managed by an
entirely different group of people.

As such, we have configured Nagios that the group that is
responsible for project can only see the machines of project A and
the "B-and-C-people" can only see B and C.

Everything is peachy.

Except. Sometimes, project A may look like it's broken (pages
time out etc). But in reality, there's a spam attack and the
project B (the LDAP infrastructure) is so slow it simply grinds
to a halt. In this case it would obviously be nice if the people
from project A could see that project B is slow. Yet they should
not be able to change the notification options/acknowledgements
etc etc of projects B or C. 

The altinity people have created a patch for the "view some,
change none" scenario[0]. Unfortunately, what I'd need is a
mechanism for the "view some, change a few" scenario I outlined
above.

How do others deal with this kind of problem? I'm sure we're not
the only ones who've run into it.

As of currently, our best guess would be to create
pseudo-accounts (like john.foo and john.foo-admin) and hack the
CGI(s) to only allow the commands from -admin accounts which are
in the notification list (with notification options set to "n").
We already do this to let people see machines they don't
mailed/paged/called about.

Regards,
Tobias

[0] http://altinity.blogs.com/dotorg/2006/02/a_view_some_cha.html

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Debugging plugins

2006-10-25 Thread Tobias Klausmann
Hi! 

While debugging it would sometimes be nice to be able to run a
plugin/check from the commandline in exactly the same fashion as
Nagios does.

What I'm after is a way of running (for example):

simulate -c /etc/nagios/nagios.conf -p 'check_http!80!http://somewhere.com'

The macros (like %HOSTADDRESS%) might come from the environment
or additional cmdline options to simulate. This program should
then show all the intermediate steps of unquoting, macro
expansione etc. just like it happens in Nagios and finally
showing the actual command executed (i.e. what's sent to exec()).

That way, finding out you've got a % somewhere or missed
something else would be nice. Also, you'd be able to test drive
check_commands before using them in your configuration.

Is there any such beast? If not, is anyone willing to code it?
I'm no expert at C and I hardly know Nagios' internals, so I
can't write it myself (or rather: I'd shoot off several of my
appendages while doing so).

Regards,
Tobias

-- 
Never touch a burning system.

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] timeouts and performance info

2006-08-30 Thread Tobias Klausmann
Hi! 

On Wed, 30 Aug 2006, Marc Powell wrote:
> > Active Service Checks:
> > <= 1 minute:81 (4.6%)
> > <= 5 minutes:   1719 (97.4%)
> > <= 15 minutes:  1727 (97.9%)
> > <= 1 hour:  1727 (97.9%)
> > Since program start:1727 (97.9%)
> 
> This seems mostly normal for a 5 minute check_interval. The small
> difference between the 5 and 15 minute counts is normal as checks may be
> just starting to execute or still in progress at the 5 minute mark. It
> does appear that you have some number of services that are not scheduled
> for execution or are executing at really long intervals. Look at Service
> Detail and sort by last check. Re-examine your configuration for those
> services that do not appear to be scheduled properly.

I have a few services that are disabled entirely (don't check
actively, don't accept passive checks). Would they count in the
above statistic? They seem to fit in with the missing 2.1%
(100-97.9). Also, I saw a few checks that were last run about ~20
minutes ago. Those are log checks via NRPE that complete within
<1s (no noticeable delay) if run directly on the machine (as user
nagios of course). It seems acceptable (and I neither know why it
would take 20 minutes nor how to find out why), so I'm willing to
let it slide ;).

> Looks pretty good to me. The high max check latency number may have been
> a one-off event. If that number regularly changes and is always very
> high then you might want to verify that you're not starving nagios for
> check by running /path/to/nagios/bin/nagios -s
> /path/to/nagios/etc/nagios and make sure you meet or exceed it's
> recommended values.

I guessed as much for the one-off event. It doesn't change, so I
feel somewhat safe. As for the recommended values (-s), Nagios
says it's okay the way it is.

> > Active Hosts Checks:
> > <= 1 minute:0 (0.0%)
> > <= 5 minutes:   3 (1.2%)
> > <= 15 minutes:  3 (1.2%)
> > <= 1 hour:  4 (1.6%)
> > Since program start:27 (10.8%)
> > 
> > and
> > 
> > Check Execution Time:   0.02 sec10.05 sec   0.208
> sec
> > Check Latency:  0.00 sec17.48 sec   0.204
> sec
> > Percent State Change:   0.00%   0.00%   0.00%
> 
> These look normal and expected. You've had 27 service failures since
> program start necessitating host checks.

That is in line with what I'd expect.

> > Am I the only one seeing a discrepancy here?
> 
> The only discrepancy I see is likely due to configuration. You probably
> have check intervals or timeperiods misconfigured for ~30 services.

About that number of services are disabled entirely right now, so
if they count into the statistic, it explains the figures.

> > The only way I can make sense of this is that the "<= 15 minutes"
> > means "time from being scheduled to actually starting the
> > plugin". In that case I wonder what makes it take so long, the
> 
> Check Latency is that number. On average nagios is able to run your
> checks within 3.043 seconds of when they are scheduled to run. The
> number you are referring to is just a simple count of the number of
> plugins that have been run in that time interval.

So it means "in the last N minutes, this many services completed"
and *not* "this many services needed N minutes to complete (from
being started to delivering the retval)"? That would be an eye
opener for me :)

Regards & Thanks,
Tobias
-- 
You don't need eyes to see, you need vision.

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] timeouts and performance info

2006-08-30 Thread Tobias Klausmann
Hi!

I have the following values in my nagios.cfg:

service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

As far as I know, those values are in seconds. What I wonder is
why I still have Service and Host Checks that take longer than
fifteen minutes to complete. This shouldn't be the case the way I
under stand it. Here's my curren perf info:

Active Service Checks:
<= 1 minute:81 (4.6%)
<= 5 minutes:   1719 (97.4%)
<= 15 minutes:  1727 (97.9%)
<= 1 hour:  1727 (97.9%)
Since program start:1727 (97.9%)

and 

Check Execution Time:   0.00 sec12.92 sec   0.275 sec
Check Latency:  0.00 sec204.30 sec  3.043 sec
Percent State Change:   0.00%   15.46%  0.02%

Active Hosts Checks:
<= 1 minute:0 (0.0%)
<= 5 minutes:   3 (1.2%)
<= 15 minutes:  3 (1.2%)
<= 1 hour:  4 (1.6%)
Since program start:27 (10.8%)

and

Check Execution Time:   0.02 sec10.05 sec   0.208 sec
Check Latency:  0.00 sec17.48 sec   0.204 sec
Percent State Change:   0.00%   0.00%   0.00%

Am I the only one seeing a discrepancy here?

The only way I can make sense of this is that the "<= 15 minutes"
means "time from being scheduled to actually starting the
plugin". In that case I wonder what makes it take so long, the
machine should be beefy neough (dual PIV Xeon 2.8Ghz, 2G of RAM).

Any hints/thoughts are appreciated.

Regards, 
Tobias

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Using SNMP as an alternative to NRPE

2006-07-13 Thread Tobias Klausmann
Hi! 

On Thu, 13 Jul 2006, Thomas Sluyter wrote:
> Why is it that we insist on using NRPE for this? Of course it's very  
> practical that there's such a thing as the NRPE daemon and the  
> check_nrpe command. It does indeed make things easier for a lot of  
> people who lack deep technical insight.

Yet it's a step away from the KISS principle.

> But what is to keep the expert users from using the SNMP daemon for  
> this practice?

SNMP *can* be a security nightmare. Problem is that the protocoll
allows *writing* to the machine, i.e. config changes. The danger
in an unsecured NRPE is much lower: it's less complex to
configure and if we assume woth the SNMPd and NRPE have no
security problems in their code, a slightly wrong config can
allow an attacker to compromise an SNMP machine. That's nigh
impossible on an NRPE machine. Also, NRPE config is much less
complex and that of an SNMPd.

> There's a bunch of factors that have pushed us away from NRPE and  
> towards SNMP:
> * The SNMP daemon is installed by default on all of our systems.  
> AFAIK it's also part of the default install of just about every OS  
> installation (with the possible exception of Windows).

It isn't installed on *any* of the >1k machines I herd. Not by
active choice. It simply isn't installed because we don't need
it. It's not part of the default install of the Distros and OSs
we use.

> * We are currently already using the SNMP daemon to gather  
> performance info for MRTG and we will be using the SNMP daemon to  
> send traps to Nagios.

That is an entirely different story. I can understand that people
use SNMPds on host machines because SNMP is the way to go for
Ciscos or other network equipment. But we're quite happy with the
way NRPE and NagiosGrapher work together with RRDTool.

Our network guys (who run a nationwide backbone and thus have
their own monitoring solution) use SNMP for their stuff. 

> * Not using NRPE means one less configuration file to maintain, one  
> less port to open up in firewalls and one less binary to patch and  
> upgrade.

Not for use: SNMP isn't a "it's there anyway" resource. Hence, we
opted for the smaller, less complex solution, NRPE.

> Do any of you know of any practical objections to using SNMP as a  
> substitute for NRPE? It might be that we're missing something here,  
> but to us it looks like a very good choice.

Complexity. Both in daemon code and configuration. And that the
SNMP protocol spec allows for writing to a host.

Regards,
Tobias

-- 
You don't need eyes to see, you need vision.


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Threads and counting them

2006-05-31 Thread Tobias Klausmann
Hi! 

I'm monitoring several processes that do not for but use threads.
Unfortunately, check_procs can be coerced into count threads
individually. While I could hack a shell script together using 
ps a -L, I'd rather use a stock plugin for this. Is there any way
to make check_procs count the threads? Or maybe anoither plugin
that can do what I need? A patch against check_procs would also
be nice, since I'm not very keen on hacking C myself (I know my
limits).

Regards,
Tobias


-- 
You don't need eyes to see, you need vision.


---
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Alerts - Verbal notification - via phone

2006-05-11 Thread Tobias Klausmann
Hi! 

On Thu, 11 May 2006, Stringham, Steven wrote:
> Has anybody out there configured Nagios to alert via telephone?
> I don't mean like a SMS message or the like. I figure I am more
> likely to answer my home phone ringing at 0 dark hundred than
> my cell phone's little email beeps. Also, that would let me
> alert others that are not actively watching the cellphone,
> blackberry, etc.

Yes, we run a setup like this for precisely the same reasons.

> I want the system to call my home phone and tell me: "This is
> your friendly office Nagios system. Host X is down - service Y
> is not running. The time is now 02:43. Get your butt out of bed
> and fix it!"

Ours is not quite as... colorful :)

> I am thinking an interface with something like Asterisk might
> work. But, not having set up Asterisk before, I am not quite
> sure where to go. So, I plan on working on getting into it.

Here's where my knowledge ends, unfortunately. The system we use
is run by an entirely different company (inside the same
corporation) and our interface consists of sending well-formatted
mails. On top, it's a commercial software they use and as its
used for quite powerful applications (like menu systems and
informational phone systems in the area of several hundred
concurrent users), it probably is prohibitively expensive for
your use-case. But read on for another idea.

> I have found one other application that did something like this, and
> when I demod it, the voice quality really stunk, and I could not
> configure what it said, etc. And, I did not like the monitor aspect of
> it as well as I like Nagios.
> 
> Just wondering if I am reinventing the wheel, or does someone has a
> spare in the garage?

I guess one area where you might find *nearly* what you need
might be software answering machines, especially in the ISDN
world. I privately use CAPISuite for my answering machine.

I think coupling that with (say) Festival/mbrola etc. might just
work for you. I don't know about their speech quality though. 

Hope this helps a bit,
Tobias
-- 
You don't need eyes to see, you need vision.


---
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null