Hi, I had some out of memory and forking problems a while ago. After some debugging I've tunned some parameters, namely service_reaper_frequency and max_concurrent_checks.
Maybe this URL will help you: http://www.nagios.org/faqs/viewfaq.php? faq_id=115 HTH, Marco Ramos On Thu, 2006-03-23 at 13:51 -0800, Armistead, Raffy wrote: > I am not sure exactly what process is causing it to run out of memory. > Since I have it as a dedicated Nagios system I would imagine it is > Nagios that is causing a problem. This occurred when we had about 4000 > devices but very seldom and it wasn't much of an issue then. Now that we > almost have 7000 devices that are being monitored it is happening more > frequently. Since this was the case I had assumed it was Nagios but > didn't know how to go about fixing the problem. > > I do not know that much about Linux so I am not sure how to go about > setting that up. How do I setup ulimits for memory utilization? What > steps would I go about to monitor memory utilization for the Nagios > server? > > I had checked the nagios.cfg file and I do have that setting at -1: > > command_check_interval=-1 > > > I appreciate any help. Thanks. > > Raffy > > -----Original Message----- > From: Marc Powell [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 23, 2006 11:12 AM > To: [email protected] > Subject: RE: [Nagios-users] Nagios 'Out Of Memory' Problems > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:nagios-users- > > [EMAIL PROTECTED] On Behalf Of Armistead, Raffy > > Sent: Thursday, March 23, 2006 12:23 PM > > To: [email protected] > > Subject: [Nagios-users] Nagios 'Out Of Memory' Problems > > > > I have a problem with my Nagios server constantly crashing. It keeps > > outputting on the screen Out of Memory errors which causes loss of > access > > to the server. I can ping the box but I cannot SSH or web into it to > view > > any information. This has been happening increasingly more lately. Now > it > > is about every 2-3 days that this is occurring. We have been adding > more > > and more devices to the servers and this problem has been increasing > as > > this occurs. This is how I have it set up. > > > > > > > > I have a Main Nagios server that is running the latest 2.0 (stable) > Nagios > > release. It is monitoring about 6800 devices but it is not actively > > checking the devices. Its main role is to provide a web interface and > > receive passive polls from three other servers which do the polling. > The > > main server also does email notifications when a device goes down. The > > server sends about 30-40 emails a day. I am using NSCA 2.5 between the > > server and the client Nagios servers. I am only monitoring one service > for > > each device which is either TCP or ping depending on the device. > Mostly > > all devices are monitored with TCP (roughly 6000). The rest are > monitored > > with ping. The individual servers are pretty evenly spread with the > number > > of devices. They are about 2000-2500 each. > > > > Can someone please help me in resolving this problem? Thanks > > Have you determined what process is using the memory? One of the first > steps you should take is to set appropriate ulimits for memory > utilization for that user so that it doesn't bring down the server. I > would configure nagios to monitor memory on that server then use top or > ps to identify the process(es) using the allocated memory when memory > utilization is high. That will provide better direction for > troubleshooting rather than simply that the machine is crashing due to > memory exhaustion. The nagios deamon itself isn't going to be using a > lot of RAM (10M on my box with 3400 passive services). > > My somewhat unfounded guess is that perhaps nagios isn't reaping the > results from NSCA frequently enough so you're having a backlog of ncsa > processes. Each process uses just a little memory but if you have > thousands of them then it adds up. I've personally experienced this on a > machine that was experiencing disk problems. If this is the case, beyond > a hardware problem or capacity issue, I'd verify that your > command_check_interval is set to -1 to make sure that nagios is checking > the external command file as quickly as it can. > > -- > Marc > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 > _______________________________________________ > Nagios-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 > _______________________________________________ > Nagios-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
