[Nagios-users] Uptime info from the command line

2009-05-27 Thread Ståle Askerød Johansen

Hi list.

We are running version 3. At the moment we are putting together
a simple interface for the management to view central data for a
smaller number of webservices. For this we want to use some data
from Nagios and graphs from Munin.

Is it possible to extract the following data from a running Nagios
using the command line? Maybe from the CGIs directly:

1) Uptime percentage for a given service for a given time period, taking
scheduled downtime into account.
2) Is the service in a state of scheduled downtime at the moment,
or not?

This can probably be done with lynx/wget/curl and awk, but I'd rather
depend on as little as possible.

Running avail.cgi gives

«Error: Could not read object configuration data!»

(embedded in HTML, as I would expect)

-- 
Ståle Johansen, University of Oslo, Norway.

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Increasing latency - over the top - a cry for help

2008-05-20 Thread Ståle Askerød Johansen


Hello, oh thou sweet fountain of problem-solving knowledge :-)

Here at the University of Oslo we are running Natios to monitor
roughly 10k services out of which ~9500 are active. We also monitor
~700 hosts. We are running nagios 3.0.1 on a Dell 2850 with 4 Gb of RAM.
And 4 kernels. We upgraded from 2.9 roughly a month ago.

We have the following problem, and are turning here for help after
fumbling in darkness for some time: The latency of both host checks
and service checks increase over time.

After a stop/start of nagios, we see the following pattern:

1) The service latency starts of at ~2.8 seconds, which we are happy 
with. It increases with about 1 ms per minute, a rough estimate.
2) The nagios process starts off at about 17m resident, shown in "top".
3) The "system" part of cpu usage starts off at ~30%

However.

4) The "system" part of cpu usage increases over a period of approx.
six hours, till it reaches a threshold of some kind at ~290%. At the
same time, the system load increases till four or five.
5) At this point, the latency of both host and service checks will
start increasing much faster, until another stop/start of nagios.
The service latency will reach 160 seconds (!) after ~9 hours.
6) At

The question is what causes this. We started using mrtg for graphing
some time after we noticed a problem, so we are not quite sure when
this started.

Our setup is actually quite simple.

o no flap-detection
o no environment macros
o no dependencies


We have tried the following, with no real effect:

use_large_installation_tweaks=1 (with various sub-tweaks)
playing with the max_concurrent_checks
checkresults on a tmpfs filesystem

So.

We are very grateful for any ideas.

I have gathered some useful data on http://folk.uio.no/staalej/nagios/

-- 
Ståle Johansen, soon in despair.

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null