--On Monday, March 13, 2006 12:12 PM -0600 Aaron Segura <[EMAIL PROTECTED]> wrote:

I'm just curious as to how many systems/services everyone here is
monitoring with mon...

I'm currently monitoring 1600 services across 300 systems and have a
constant load average of about 7-9 (that's load average, not CPU %
used).  We have plans to push all 800+ systems onto our little
'mon'itor, but are having problems...

My host count is higher, currently 2637 hosts, in 122 groups, with 714 services defined on those groups. I run over 500K tests per day. My largest mon config file is 19K lines.



I'm running mon on a Dual HT 3.0Ghz Xeon system w/4GB RAM, and there are
also some custom PHP daemons running in the background to handle queued
database insertions and incoming snmptrap packets from Windows hosts
(containing event logs).


I'n running mon on three systems, one master system that has all my alert configuration, and three slave systems that do most of the work and just send mon traps to the master.

I'm afraid I might be running into resource issues, as mon is crashing
randomly (couple times a day) ever since I added the last batch of
systems.

What version of Mon are you running? I suspect you're not using a current development release. Earlier versions of Mon rely heavily on Text::Parsewords for parsing alert output, and that module uses some very complex regexps that when run on large strings result in Perl running out of stack space and segfaulting. Current mon in CVS or the dev releases fixes this problem by changing the protocol encoding format very slightly and just using split, which is *much* less expensive (both memory and cpu). (This also improves the overall mon performance, particularly the speed of mon.cgi when the result of 'list opstatus' is large.)

 Is there any sort of internal
limit to the number of services or hosts that can be monitored?

No, though there are some practical limits to what you can do on a single host. Mon completely crashing sounds more like the problem I described above.


Am I pushing this piece of software to it's limits on this hardware?
Should I start using multiple mon-hosts, or should I be looking
elsewhere for the root of the problem?  I know this is dependent on the
specific monitoring scripts I'm using, but anyone have any general ideas
about how many services I should be able to monitor from a single
mon-host of the type I described?

You may be close to the limit of what you can do with a single host. When the system is running ok I would ask myself "Are tests being run on schedule, or is the scheduler overloaded?" and "Is the interactive performance of the system still acceptable?"


Any advice/suggestions/flames/taunts?

Use the latest code!  Send bug reports!  Pay me to fix them!
(Just kidding on that last one, though that door is always open... :)

-David

David Nolan                    <*>                    [EMAIL PROTECTED]
curses: May you be forced to grep the termcap of an unclean yacc while
     a herd of rogue emacs fsck your troff and vgrind your pathalias!

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to