Marcel wrote: > When I have more than, say, 10k checks, I start seen check latency rises > and there just isn't anything that could be done, even distributed > monitoring have the nagios.cmd write-lock bottleneck.
So, I've just gone through this, and the single greatest bottleneck I had to deal with is notifications. But, I have a lot of people in the notification tree, and pull in a lot of meta-data to make ticket tracking and issue resolution easier and faster. Since Nagios needs to know the exit status of notification commands, it doesn't fork before notifications.. it just plods along waiting for the notification command to exit. I switched all our non-pager notification commands to drop a spool file in a directory, letting another process read the spool files, generate email contents, query ticket databases, pull in documentation or extended testing information (full mysql processlist output, for dbas.. etc) and caching it for subsequent notifications for that event. That showed a HUGE improvement to my master server's performance. If notifications aren't your bottleneck, you can move all your temporary files to ramdisk. You can also increase your FIFO pipe size, but that only delays the issue and doesn't really solve the problem if you're always running hot. It also probably involves recompiling your kernel. If you're using nsca, you can cache your updates for a second or two, so that multiple updates happen in the same socket connection. Alternately (or additionally) you can have nsca update the checkresults directory, directly, skipping the steps where nagios reads the command pipe, and then just writes it back out to the checkresults directory. I can package up a patch (against 2.7.2) of those last couple changes (I need to submit them, anyway). If you're manlier than I might be, you could also consider modifying the core nagios to allow submissions from distributed nagios servers, directly to a socket, but doing that right might require serious threaded c foo, and depending on your OS and threading library, you might be locked to a single core. So, you have options. They're not all equal, and aren't all easy. But you wouldn't be working with monitoring if you didn't like challenges... :) -- Mike Lindsey ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null