On Jun 9, 2014, at 4:43 AM, Andreas Ericsson <[email protected]> wrote:
> Heya Jason. Long time no see. How's things? > > On 2014-06-05 16:19, Jason Cook wrote: >> >> On May 22, 2014, at 2:42 AM, Sven Nierlein <[email protected]> wrote: >> >>> On 13/05/14 15:59, Jason Cook wrote: >>>>>> On 07/05/14 16:37, Jason Cook wrote: >>>>>>> Yep, the Naemon core process definitely is the one that grows and is >>>>>>> 100% reproducible for me. It grows to the max available memory on the >>>>>>> box, then gets OOM killed. Doesn't happen when mod_gearman isn't >>>>>>> enabled. I've seen it may also be happening with Nagios 4 as well, but >>>>>>> haven't tested it myself. >>>>>>> >>>>>>> Test environment is RHEL 6u4. >>>>>>> >>>>>>> The valgrind log wasn't mine, but we seem to have very similar setups. >>>>>>> >>>>>> Could you try the latest version of mod-gearman? I fixed some potential >>>>>> memory leacks which may occure in case >>>>>> of connection errors. >>>>>> >>>> Looks like it’s still swelling.. after ~19 hours.. >>> >>> I found another memory leak. Seems like the way check result were freed has >>> changed, so mod-gearman has to do that by itself now. >>> Could you try the latest git HEAD of mod-gearman? In my tests, memory usage >>> was constant over the last 12 hours. >>> >>> Sven >> >> Just to follow up on this, it’s a lot better, though still happening (albeit >> much, much slower)… >> >> nagios 22629 3.6 18.9 2208276 1523904 ? Ssl May30 317:59 >> /usr/bin/naemon -d /etc/naemon/naemon.cfg >> >> After running for nearly a week, it’s at ~1.5GB memory usage… Here it is in >> a 60 second snapshot.. >> >> nagios 22629 3.6 18.9 2208276 1524700 ? Ssl May30 318:10 >> /usr/bin/naemon -d /etc/naemon/naemon.cfg >> nagios 22629 3.6 18.9 2208276 1524916 ? Ssl May30 318:12 >> /usr/bin/naemon -d /etc/naemon/naemon.cfg >> >> Growing very, very slowly, but still growing. >> > > That looks like a small-ish string or a container for something is > being leaked continuously. Are you using a lot of on-demand macros, > or custom object variables? > > I'm trying to think of things we may have overlooked when running > valgrind tests here. Normally, naemon doesn't leak at all, but it > seems we haven't tested every possible feature in a long-running > system. > > Worst case scenario, memory is lost due to fragmentation, but it eats > RAM a little bit too fast for it to be that. > > /Andreas No on-demand macros or custom object variables - our configs are really, really straight forward. This example is a small-ish config, about 1300 hosts and 11,000 service objects.
