[Nagios-users] Uptime info from the command line
Hi list. We are running version 3. At the moment we are putting together a simple interface for the management to view central data for a smaller number of webservices. For this we want to use some data from Nagios and graphs from Munin. Is it possible to extract the following data from a running Nagios using the command line? Maybe from the CGIs directly: 1) Uptime percentage for a given service for a given time period, taking scheduled downtime into account. 2) Is the service in a state of scheduled downtime at the moment, or not? This can probably be done with lynx/wget/curl and awk, but I'd rather depend on as little as possible. Running avail.cgi gives «Error: Could not read object configuration data!» (embedded in HTML, as I would expect) -- Ståle Johansen, University of Oslo, Norway. -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Increasing latency - over the top - a cry for help
Hello, oh thou sweet fountain of problem-solving knowledge :-) Here at the University of Oslo we are running Natios to monitor roughly 10k services out of which ~9500 are active. We also monitor ~700 hosts. We are running nagios 3.0.1 on a Dell 2850 with 4 Gb of RAM. And 4 kernels. We upgraded from 2.9 roughly a month ago. We have the following problem, and are turning here for help after fumbling in darkness for some time: The latency of both host checks and service checks increase over time. After a stop/start of nagios, we see the following pattern: 1) The service latency starts of at ~2.8 seconds, which we are happy with. It increases with about 1 ms per minute, a rough estimate. 2) The nagios process starts off at about 17m resident, shown in "top". 3) The "system" part of cpu usage starts off at ~30% However. 4) The "system" part of cpu usage increases over a period of approx. six hours, till it reaches a threshold of some kind at ~290%. At the same time, the system load increases till four or five. 5) At this point, the latency of both host and service checks will start increasing much faster, until another stop/start of nagios. The service latency will reach 160 seconds (!) after ~9 hours. 6) At The question is what causes this. We started using mrtg for graphing some time after we noticed a problem, so we are not quite sure when this started. Our setup is actually quite simple. o no flap-detection o no environment macros o no dependencies We have tried the following, with no real effect: use_large_installation_tweaks=1 (with various sub-tweaks) playing with the max_concurrent_checks checkresults on a tmpfs filesystem So. We are very grateful for any ideas. I have gathered some useful data on http://folk.uio.no/staalej/nagios/ -- Ståle Johansen, soon in despair. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null