We use Zabbix here pretty heavily. Monitoring roughly 10,000 hosts 13,000 interfaces and a mirage of services.
-Brent > On Jun 7, 2016, at 2:42 AM, Mikael Falkvidd <mikael.falkv...@op5.com> wrote: > >> >> On Monday, June 6, 2016, Manuel Marín <m...@transtelco.net> wrote: >> >>> Dear Nanog community >>> >>> We are currently planning to upgrade our monitoring system (Opsview) due >> to >>> scalability issues and I was wondering what do you recommend for >> monitoring >>> 5000 hosts and 35000 services. We would like to use a monitoring system >>> that is compatible with the nagios plugin format, however we are not sure >>> if systems like Icinga/Shinken/Op5 are the way to go. >>> >>> Is someone using systems like Op5 or Icinga2 for monitoring > 5000 hosts? >>> Would you recommend commercial systems like Sevone, Zabbix, etc instead >> of >>> open source ones? >> > > We (op5) have customers running > 50,000 hosts and > 300,000 services. So > 5,000 hosts is generally not a problem. > > As mentioned by Jeff, the forking model *can* become a problem. Small > binaries > that don't load a lot of libraries fork pretty fast. A test we made some > time ago > showed a 15 minute load peak at 3.89 (on 24 cores/hyperthreads) when > checking > 100,000 services every 5 minutes. Check latencies were 0.8 seconds max and > 0.002 seconds avg. Average cpu load was 15%. > > Specs for the machine used: > Dell PowerEdge R620 > 2x Intel Xeon E5-2620 > 24 GB ram > Dell PERC H710 hardware RAID card > RAID10 on 4x300GB 15kRPM SAS drives > > So a single (now almost vintage) server can handle 300 plugin executions per > second without breaking a sweat. Scaling up is definitely a possibility, but > scaling out (using mod gearman, mk or merlin, all open source) is available > as > well. > > Complex plugins, for example check_vmware_api which loads the large VMware > perl SDK can get you in trouble though. I suggest you run a test with the > plugin > mix you are planning to use. > > If scaling out is not an option, and you want to stay in the nagios/naemon > world, > a custom worker can be developed to get rid of the loading overhead. > Documentation is available at > http://www.naemon.org/documentation/developer/workers.html > > Full disclosure: I work as development team lead at op5 > > best regards > Mikael Falkvidd
signature.asc
Description: Message signed with OpenPGP using GPGMail