On the server side however, the check works as a heartbeat.
Checking if the localservice is still alive. But this is
only performed once every hour.

My suggestion would be to use the 'redistribute' feature that was
added a while back on the agent, causing it to pass every status
update to the master, so you can see that the check was run recently
and the result was OK.

Then you can also set the traptimeout setting to ensure that you are
receiving traps at regular intervals, and alert if the agent stops
sending traps.

I did exactly this with Mon with a master/slave Mon setup.  (Its why I
implemented the redistribute feature)


Thank you. I had missed that option, which is clearly nice to know about in master/slave setups.

However, my agents are running mon version 0-99-2.6, which don't have the redistribute option yet. And upgrading all my 1000+ agents is a bit out of scope atm. I'm also unsure how my servers would react to all the agents sending the info each time they poll each service (1m interval on services on the agents).

Anyway, thank you for the tips everyone. But for the time being, I think I have to modify the server script to be a snap-in replacement.



Anders Synstad
Basefarm AS

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to