[Nagios-users] Memory leaks

2007-01-23 Thread Tobias Klausmann
Hi! 

(First off: if this should also go to nagios-devel, just yell at
 me.)

Nagios 2.6 and 2.5 have memory leaks. They are not that big that
within hours your machine will be swapping, but they degrade
performance in other ways.

First off, their approximate extent.

2.5 and 2.6 without perl cache have the smallest memory leaks. A
fairly busy Nagios server (hardware quoted below) with about 3000
services on about 330 hosts will degrade from 330M used (that's
*not* Nagios alone) to 368M used in about 16 hours. Or about 2.4
MB per hour. The very same machine behaves neutral if Nagios is
not running, so it's definitely Nagios itself.

Activating the embedded Perl interpreter and -cache will increase
the amount of lost memory to about 5-6M per hour. In this case,
however, sometimes the memory usage snaps back, i.e. some of the
lost memory is collected. I've not yet found out what triggers
the reclaim. Still, over the course of hours, more and more
memory is lost. Still, it's roughly linear memory loss.

And finally, there's the advanced permission patch. With that
patch, memory leaking skyrockets to about 15M/hour.

Now all of this could be alleviated by simply restarting Nagios
every night. It's not actually a bugfix but merely doctoring on
the symptoms, but still, it's pragmatic.

Unfortunately, performance degradation is not just on the memory
used front. With increased memory usage, check latency increases.
In the worst case, this can mean that latency increases by 120s in
about six hours. This has the net effect that for our case, we
have to restart Nagios every two hours. 

For the case of 2.5 and 2.6 without the permissions patch, it's
a lot less bad, but still bad enough to require restarting Nagios
at least every eight hours. 

Without all the fancy stuff, we get to restarting Nagios every 24
hours, as described above.

Further observations: the permission patch causes latency
degradation to be directly correlated to amount of notifications,
The more notifications, the quicker things get nasty.

For vanilla Nagios, at least it's clear that in whatever way
memory is wasted, it also slows Nagios down - a possibility would
be a linked list that is walked and gets appended over and over.
But I guess those with knowledge of the inner workings of Nagios
have more clue about this than I do.

The question that remains is, if this can (and will) be tackled
before 3.0 is released. A related question is if Nagios 3 will be
prone to the same problem.

Any thoughts, ideas etc. are appreciated.

Regards,
Tobias

PS: On a whim, I tried running Nagios through/in Valgrind but
honestly got knocked over by the amount of info Valgrind spewed
at me.

PPS: Our setup uses only active service checks, notifications by
mail (some of it to SMS gateways etc). All host checks are active
yet only are executed if needed (the usual way Nagios works). All
host checks are using ping.  All plugins have a hard timeout of
10s.

PPPS: Hardware specs of the machine I tested with:
Dual dualcore Opteron 2.2GHz (Model 2214)
2GBytes of RAM
(if there's anything else relevant, drop me a line)

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Memory leaks

2007-01-24 Thread Andreas Ericsson
Tobias Klausmann wrote:
> Hi! 
> 
> (First off: if this should also go to nagios-devel, just yell at
>  me.)
> 
> Nagios 2.6 and 2.5 have memory leaks. They are not that big that
> within hours your machine will be swapping, but they degrade
> performance in other ways.
> 
> First off, their approximate extent.
> 
> 2.5 and 2.6 without perl cache have the smallest memory leaks. A
> fairly busy Nagios server (hardware quoted below) with about 3000
> services on about 330 hosts will degrade from 330M used (that's
> *not* Nagios alone) to 368M used in about 16 hours. Or about 2.4
> MB per hour. The very same machine behaves neutral if Nagios is
> not running, so it's definitely Nagios itself.
> 
> Activating the embedded Perl interpreter and -cache will increase
> the amount of lost memory to about 5-6M per hour. In this case,
> however, sometimes the memory usage snaps back, i.e. some of the
> lost memory is collected. I've not yet found out what triggers
> the reclaim. Still, over the course of hours, more and more
> memory is lost. Still, it's roughly linear memory loss.
> 

Yes. Embedded perl is known to be leaky. It's also mentioned in various
documents around the web.


> And finally, there's the advanced permission patch. With that
> patch, memory leaking skyrockets to about 15M/hour.
> 

Yes. I pointed out where a few of those leaks where in a previous
email. I'd recommend you don't use that patch, actually. At least
not until whoever wrote it comes up with a fixed version of it.


> Unfortunately, performance degradation is not just on the memory
> used front. With increased memory usage, check latency increases.
> In the worst case, this can mean that latency increases by 120s in
> about six hours. This has the net effect that for our case, we
> have to restart Nagios every two hours. 
> 

The latency increase should only happen when the machine starts swapping.
For large networks with the access-patch thingie that could happen fairly
quickly though, I imagine.

> For the case of 2.5 and 2.6 without the permissions patch, it's
> a lot less bad, but still bad enough to require restarting Nagios
> at least every eight hours. 
> 
> Without all the fancy stuff, we get to restarting Nagios every 24
> hours, as described above.
> 

That seems a bit obsessive. Are you doing anything unusual with the system?
We have several (well over a hundred) installations where Nagios has been up
and running for several months without requiring a restart.

> 
> For vanilla Nagios, at least it's clear that in whatever way
> memory is wasted, it also slows Nagios down - a possibility would
> be a linked list that is walked and gets appended over and over.
> But I guess those with knowledge of the inner workings of Nagios
> have more clue about this than I do.
> 

Anyone wanting to look into it should probably take a look at the
event scheduling queue.

-- 
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Memory leaks

2007-01-24 Thread Tobias Klausmann
Hi! 

On Wed, 24 Jan 2007, Andreas Ericsson wrote:
> > Activating the embedded Perl interpreter and -cache will increase
> > the amount of lost memory to about 5-6M per hour. In this case,
> > however, sometimes the memory usage snaps back, i.e. some of the
> > lost memory is collected. I've not yet found out what triggers
> > the reclaim. Still, over the course of hours, more and more
> > memory is lost. Still, it's roughly linear memory loss.
> 
> Yes. Embedded perl is known to be leaky. It's also mentioned in various
> documents around the web.

Well, I think I can live without the embedded interpreter, the
machine is beefy enough.

> > Unfortunately, performance degradation is not just on the memory
> > used front. With increased memory usage, check latency increases.
> > In the worst case, this can mean that latency increases by 120s in
> > about six hours. This has the net effect that for our case, we
> > have to restart Nagios every two hours. 
> 
> The latency increase should only happen when the machine starts swapping.
> For large networks with the access-patch thingie that could happen fairly
> quickly though, I imagine.

No, it's definitely not swapping (as the graphs show). My
conclusions about the reasons for the degradation were drawn with
exactly that in mind.


> > For the case of 2.5 and 2.6 without the permissions patch, it's
> > a lot less bad, but still bad enough to require restarting Nagios
> > at least every eight hours. 
> > 
> > Without all the fancy stuff, we get to restarting Nagios every 24
> > hours, as described above.
> 
> That seems a bit obsessive. Are you doing anything unusual with the system?
> We have several (well over a hundred) installations where Nagios has been up
> and running for several months without requiring a restart.

Well, the system is a standalone Nagios server which is only
that, no other services. I'll take a very close look at all the
cronjobs etc. that might cause additional friction, but I doubt
they're causing any trouble.

> > For vanilla Nagios, at least it's clear that in whatever way
> > memory is wasted, it also slows Nagios down - a possibility would
> > be a linked list that is walked and gets appended over and over.
> > But I guess those with knowledge of the inner workings of Nagios
> > have more clue about this than I do.
> 
> Anyone wanting to look into it should probably take a look at the
> event scheduling queue.

Thanks, I'll ask our resident C guru to take a close look at it.

Regards,
Tobias

-- 
Never touch a burning system.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null