Re: [icinga-users] Icinga polling of nodes with dynamic host names and IP addresses ( Openstack )

Michael Friedrich Wed, 21 May 2014 13:56:14 -0700

Hi,

it took me a while, mainly for the reason that we were redesigning thecluster feature in the past weeks to solve some long standing issues,and we weren't satisfied with the previous implementation either.


On 27.04.2014 16:40, Max Schubert wrote:

Hi all,
We've been very successful with a custom built distributed Nagiosarchitecture at my job that consists of:
* Central config
* DB with pollers defined for the config
* automation that takes
  * Imports the config into a DB
* Divides it into smaller Nagios configs by ( hosts / pollers ) -each poller getting as equal a distrubtion of hosts as we can do* Writes out an objects.cache for each poller including all relatedconfig deps
  * pushes the smaller configs out to each poller over scp / ssh
  * restarts all pollers
* Pollers stream data to an instance-specific DB ( so X pollers -> 1 DB )
* Centralized UI for viewing all results and doing command and controlon them without caring where the poller physically is
We've got 7 clusters working this way ( each cluster has it's own DBby team or project and it's own UI ) and the process is totallyself-service. Teams maintain their own configs, check them into SVN,then run a deploy command to push out tot their pollers. Theymaintain the pollers, we maintain the backend arch for notifications,metrics streaming, and the UIs.
This system is monitoring over 100k nodes and 400k active servicechecks every 5 minutes. Works great for static, "pet" architectureswhere there are lots of VMs or hardware hosts that are maintained andcared for and that don't change often.
Not so good for dynamic envs!
Was tweeting a little with Michael ( thanks Michael! ) just an houror so ago about how to use our knowledge and experience with Nagios /Icinga to have this work for a more dynamic env - cloud VMs or dockerimages, where hostnames and IP addresses are dynamically generated andthe instance is use once then trash.
My first thought on how this could work.
* We're moving to HA proxy ( that will be our "pet" host in our newarchitectures - our current monitoring arch will work fine formonitoring those ).* Each HA proxy will have to be bounced through automation ( zookeepermost llikely ) when a node is brought up or down and have it's configre-written to add / delete nodes* At that point we could also add / delete nodes from a mini Icingainstance on each proxy that would serve as the health checker for thenodes and also then stream results ( being intentionally generic here) to our back end for notification and alarming.* The configs would be tiny and the host portions of the configs wouldbe maintained local only - we'd just have to push new service checks /host groups etc as needed
What kinds of approaches do you all take with these more dynamicenvironments? If you don't use Icinga for that layer of monitoring,what do you use?
I think what will surely be deemed our "classic" approach willcontinue to work fine for embedded devices / applicance as there's noother choice there :p.

Basically you should look into the cluster model which is to be releasedwith 2.0 and its zone concept. The previously discussed idea on twitterworks in a similar fashion, but the zone models adds the capability ofdoing load distribution and high availability directly in one zone -which could be one check satellite, or multiple master, or multiplechecker instances for instance.


http://docs.icinga.org/icinga2/snapshot/chapter-4.html#distributed-monitoring-and-high-availability

Basically your setup could be divided into such zones and could looklike on your config master (assuming you like HA, making it 2 nodes inthe zone electing their zone master - the one which exclusively runs theDB IDO feature, until there's a failover condition met when then thesecondary node takes over DB IDO).

Note: Default port ist 5665 if not given. Required for endpoints and theApiListener.


# icinga2-enable-feature api
# vim /etc/icinga2/features-enabled/api.conf
add 'accept_config = true' to all nodes
# service icinga2 restart

Regarding the master zone - both nodes have DB IDO enabled and configured.

object Endpoint "master1" { host = "192.168.2.30" }
object Endpoint "master1" { host = "192.168.2.40" }

object Zone "config-ha-master" {
  endpoints = [ "master1", "master2" ]
}

The pollers could work in 2 seperate scenarios

1) a generic poller zone where all involved nodes receive the sameconfiguration and do load-balanced checks sharing the load2) smaller poller zones where only a few or one poller act as checksatellite.


ad 1)

If you're planning to implement 1) all pollers must be able to see eachother in order to replicate all events and to further determine how manycheckers are available to calculating check load distribution amongthemselves (modulo n). That isn't always possible and rather makes sensein your local network.


Could look like

object Endpoint "p1" { host = "192.168.3.10" }
object Endpoint "p2" { host = "192.168.3.20" }
object Endpoint "p3" { host = "192.168.3.30" }

object Zone "pollers" {
  endpoints = [ "p1", "p2", "p3" ]
  parent = "config-ha-master"
}

The configuration on the master would look like the following in/etc/icinga2 (note: zone names must match the directory names)


zones.d/
  config-ha-master
    local.conf
  pollers
    checks.conf


ad 2)

If you see the pollers in their own (network) location, they should geta zone for their own. Their local zone and endpoint configuration mustonly see the parent "config-ha-master" zone and all involved endpoints.That's important for the general connection attempts and furthercommunication.

Imagine that p1, p2 are a seperate zone, while p3 and p4 do some loadbalancing in the third zone.


object Endpoint "p1" { host = "192.168.3.10" }
object Endpoint "p2" { host = "192.168.3.20" }
object Endpoint "p3" { host = "192.168.3.30" }
object Endpoint "p4" { host = "192.168.3.40" }


object Zone "poller1" {
  endpoints = [ "p1" ]
  parent = "config-ha-master"
}

object Zone "poller2" {
  endpoints = [ "p2" ]
  parent = "config-ha-master"
}

object Zone "poller3" {
  endpoints = [ "p3", "p4" ]
  parent = "config-ha-master"
}

That could be organized like the following in /etc/icinga2 on the configmaster. (Note: Multiple instances elect their own active zone masterwhich takes care of the primary message routing, and also runs the DBIDO HA feature. The configuration is synced among all nodes in the zonetrusting each other in /var/lib/icinga2/api/zones...)


zones.d/
  config-ha-master/
    local.conf
  poller1/
    checks.conf
  poller2
    checks.conf
  poller3
    shared-checks.conf

Regarding the dynamic environment - you would still organize yourconfiguration on the master in different zones. The zones only get theconfiguration from their directory and nothing else (poller1 doesn't seeanything from poller2 zone for example).

If you are planning to dynamically add a new poller for checks to anexisting zone, you need to do the following

- add the instance itself. enable the apilistener, install the sslcertificates.- add a new Endpoint config object, and deploy that to all zone pollersand the master- while at it, add the new endpoint to that zone config, and deploy thatto all zone pollers and the master

- reload the zone pollers and the master to see the new instance

The reload btw behaves now like a real reload thanks to Gerd's work - itforks a child verifying the configuration, and if everything went fine,the old parent process is killed and the checkresults/history is readfrom the state file in order not to loose any important information.That way the reload as is takes seconds (not noticable from the shell).

The overall question is - how many pollers do you really need withIcinga 2. It would be interesting to throw all the 100k hosts and 400kservices into one single box and with plenty of hardware and try toscale it for high performance.

But sometimes pollers truly solve the problem of a real distributedarchitecture not available with Nagios 3/4 or Icinga 1.x. By 'real' Imean at least check load distribution and integrated replication ofevents. HA features and advanced zone capabilities like configurationsynchronisation are just a bonus, because we can do it with Icinga 2.

To get an idea what I am talking about, you can try the simple 2 clusternode scenario I've been building for demo cases (inherited from theoriginal Netways cebit demo setup) available as Vagrant boxes.

Details athttps://git.icinga.org/?p=icinga-vagrant.git;a=blob;f=icinga2x-cluster/README;hb=HEAD




kind regards,
Michael


--
DI (FH) Michael Friedrich

[email protected]  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
[email protected]       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Re: [icinga-users] Icinga polling of nodes with dynamic host names and IP addresses ( Openstack )

Reply via email to