Hi! (First off: if this should also go to nagios-devel, just yell at me.)
Nagios 2.6 and 2.5 have memory leaks. They are not that big that within hours your machine will be swapping, but they degrade performance in other ways. First off, their approximate extent. 2.5 and 2.6 without perl cache have the smallest memory leaks. A fairly busy Nagios server (hardware quoted below) with about 3000 services on about 330 hosts will degrade from 330M used (that's *not* Nagios alone) to 368M used in about 16 hours. Or about 2.4 MB per hour. The very same machine behaves neutral if Nagios is not running, so it's definitely Nagios itself. Activating the embedded Perl interpreter and -cache will increase the amount of lost memory to about 5-6M per hour. In this case, however, sometimes the memory usage snaps back, i.e. some of the lost memory is collected. I've not yet found out what triggers the reclaim. Still, over the course of hours, more and more memory is lost. Still, it's roughly linear memory loss. And finally, there's the advanced permission patch. With that patch, memory leaking skyrockets to about 15M/hour. Now all of this could be alleviated by simply restarting Nagios every night. It's not actually a bugfix but merely doctoring on the symptoms, but still, it's pragmatic. Unfortunately, performance degradation is not just on the memory used front. With increased memory usage, check latency increases. In the worst case, this can mean that latency increases by 120s in about six hours. This has the net effect that for our case, we have to restart Nagios every two hours. For the case of 2.5 and 2.6 without the permissions patch, it's a lot less bad, but still bad enough to require restarting Nagios at least every eight hours. Without all the fancy stuff, we get to restarting Nagios every 24 hours, as described above. Further observations: the permission patch causes latency degradation to be directly correlated to amount of notifications, The more notifications, the quicker things get nasty. For vanilla Nagios, at least it's clear that in whatever way memory is wasted, it also slows Nagios down - a possibility would be a linked list that is walked and gets appended over and over. But I guess those with knowledge of the inner workings of Nagios have more clue about this than I do. The question that remains is, if this can (and will) be tackled before 3.0 is released. A related question is if Nagios 3 will be prone to the same problem. Any thoughts, ideas etc. are appreciated. Regards, Tobias PS: On a whim, I tried running Nagios through/in Valgrind but honestly got knocked over by the amount of info Valgrind spewed at me. PPS: Our setup uses only active service checks, notifications by mail (some of it to SMS gateways etc). All host checks are active yet only are executed if needed (the usual way Nagios works). All host checks are using ping. All plugins have a hard timeout of 10s. PPPS: Hardware specs of the machine I tested with: Dual dualcore Opteron 2.2GHz (Model 2214) 2GBytes of RAM (if there's anything else relevant, drop me a line) ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null