On Wed, Aug 18, 2010 at 11:07 AM, Kyle O'Donnell <[email protected]> wrote: > we have ~ 30000 services and ~3000 hosts > > we have 6 pollers (each have a backup) processing checks and forwarding > back to a central nagios host. > > our busiest poller has ~1000 hosts and ~9000 services... avg service check > interval is 5 minutes, but there are a bunch at 1 and 2 minute intervals. > > avg service check latency is less than 1 second > > This is ~3yr old hardware too, i suspect we could increase capacity by 50% > if we move to the new intel nahalems
Nice - appreciate you sharing your numbers - everyone who does distributed code around Nagios adds overhead, so it is nice to see real numbers as opposed to 'as many as can be done' as we all know how wildly that varies :) - I have spent many many hours with my colleagues tuning the 'as many as can be done' numbers. We have done a distributed variant of Nagios as well - our non-distributed pollers (Compaq 380s with 8 GB RAM + RAID 10) poll 2k host checks (every 10 minutes) and 11k service checks (avg interval 5 minutes), all checks send performance data through a NEB module as well to our performance data processing tier - with our distributed code in place that falls to around 1.5k host checks and 8-9k service checks per poller. Average non-distributed host and service check latency around 1.2 seconds, distributed around 2.4 seconds. Our new hardware consists of Dell R710s - dual 8 core processors, wow do those rock - with our distributed code we are getting around 2x those numbers per poller even with the overhead of the distribution mechanism in place. We will be releasing our distributed variant as open source software in the next month or so - i suspect that our methodology is org specific enough that it will not work for many places, but for higher volume polling it might be worthwhile to adopt and some of the concepts and metholodigies in it we hope will lead to sparking ideas in others for better ways to do distributed Nagios. We also take the approach of pushing out configs to remote pollers - we have a redundant UI tier where we stage a configuration - after the configuration is staged, we have code (will allow for manual operator adjustment in a dot release) that will equally distribute checks among pollers desginated as being available for use - that code then builds out a common retention.dat file for all pollers along with objects.pre-cache files for each poller - those files are pushed out to each poller and the pollers are restarted (yes, we have thought through and worked out all the synchronization issues involved). Our UI then lets users take the actions the Nagios Ui does and knows where to send the commands to affect the real poller instances. Working well so far, and as with all the alternate Nagios UIs, we are able to make a much more intuitive and flexible UI. Code should be available in early October. - Max ------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
