Again, thank you for all the quick answers. This list/community is awesome!!!
I'm already using tmpfs, increased named pipe buffer size, did everything that one is supposed to do in order to increase performance. I think I'd go with removing sleep calls in the code. I'm at version 3.2.1 and would love to have a look at Max's patch! Notification is not my bottleneck, and this is not for my own nagios install, it's for someone else, so I cannot post nagios.cfg here. Sorry. But again, thanks for all the answers!!! On Tue, May 18, 2010 at 5:49 PM, Mike Lindsey <mike-nag...@5dninja.net>wrote: > Marcel wrote: > > When I have more than, say, 10k checks, I start seen check latency rises > > and there just isn't anything that could be done, even distributed > > monitoring have the nagios.cmd write-lock bottleneck. > > So, I've just gone through this, and the single greatest bottleneck I > had to deal with is notifications. But, I have a lot of people in the > notification tree, and pull in a lot of meta-data to make ticket > tracking and issue resolution easier and faster. Since Nagios needs to > know the exit status of notification commands, it doesn't fork before > notifications.. it just plods along waiting for the notification command > to exit. > > I switched all our non-pager notification commands to drop a spool file > in a directory, letting another process read the spool files, generate > email contents, query ticket databases, pull in documentation or > extended testing information (full mysql processlist output, for dbas.. > etc) and caching it for subsequent notifications for that event. > > That showed a HUGE improvement to my master server's performance. > > If notifications aren't your bottleneck, you can move all your temporary > files to ramdisk. > > You can also increase your FIFO pipe size, but that only delays the > issue and doesn't really solve the problem if you're always running hot. > It also probably involves recompiling your kernel. > > If you're using nsca, you can cache your updates for a second or two, so > that multiple updates happen in the same socket connection. > > Alternately (or additionally) you can have nsca update the checkresults > directory, directly, skipping the steps where nagios reads the command > pipe, and then just writes it back out to the checkresults directory. > > I can package up a patch (against 2.7.2) of those last couple changes (I > need to submit them, anyway). If you're manlier than I might be, you > could also consider modifying the core nagios to allow submissions from > distributed nagios servers, directly to a socket, but doing that right > might require serious threaded c foo, and depending on your OS and > threading library, you might be locked to a single core. > > So, you have options. They're not all equal, and aren't all easy. But > you wouldn't be working with monitoring if you didn't like challenges... > :) > > -- > Mike Lindsey > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null >
------------------------------------------------------------------------------
_______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null