Hi all, We've been very successful with a custom built distributed Nagios architecture at my job that consists of: * Central config * DB with pollers defined for the config * automation that takes * Imports the config into a DB * Divides it into smaller Nagios configs by ( hosts / pollers ) - each poller getting as equal a distrubtion of hosts as we can do * Writes out an objects.cache for each poller including all related config deps * pushes the smaller configs out to each poller over scp / ssh * restarts all pollers * Pollers stream data to an instance-specific DB ( so X pollers -> 1 DB ) * Centralized UI for viewing all results and doing command and control on them without caring where the poller physically is
We've got 7 clusters working this way ( each cluster has it's own DB by team or project and it's own UI ) and the process is totally self-service. Teams maintain their own configs, check them into SVN, then run a deploy command to push out tot their pollers. They maintain the pollers, we maintain the backend arch for notifications, metrics streaming, and the UIs. This system is monitoring over 100k nodes and 400k active service checks every 5 minutes. Works great for static, "pet" architectures where there are lots of VMs or hardware hosts that are maintained and cared for and that don't change often. Not so good for dynamic envs! Was tweeting a little with Michael ( thanks Michael! ) just an hour or so ago about how to use our knowledge and experience with Nagios / Icinga to have this work for a more dynamic env - cloud VMs or docker images, where hostnames and IP addresses are dynamically generated and the instance is use once then trash. My first thought on how this could work. * We're moving to HA proxy ( that will be our "pet" host in our new architectures - our current monitoring arch will work fine for monitoring those ). * Each HA proxy will have to be bounced through automation ( zookeeper most llikely ) when a node is brought up or down and have it's config re-written to add / delete nodes * At that point we could also add / delete nodes from a mini Icinga instance on each proxy that would serve as the health checker for the nodes and also then stream results ( being intentionally generic here ) to our back end for notification and alarming. * The configs would be tiny and the host portions of the configs would be maintained local only - we'd just have to push new service checks / host groups etc as needed What kinds of approaches do you all take with these more dynamic environments? If you don't use Icinga for that layer of monitoring, what do you use? I think what will surely be deemed our "classic" approach will continue to work fine for embedded devices / applicance as there's no other choice there :p. - Max
_______________________________________________ icinga-users mailing list [email protected] https://lists.icinga.org/mailman/listinfo/icinga-users
