On the server side however, the check works as a heartbeat.
Checking if the localservice is still alive. But this is
only performed once every hour.
My suggestion would be to use the 'redistribute' feature that was
added a while back on the agent, causing it to pass every status
update to the master, so you can see that the check was run recently
and the result was OK.
Then you can also set the traptimeout setting to ensure that you are
receiving traps at regular intervals, and alert if the agent stops
sending traps.
I did exactly this with Mon with a master/slave Mon setup. (Its why I
implemented the redistribute feature)
Thank you. I had missed that option, which is clearly nice to know about
in master/slave setups.
However, my agents are running mon version 0-99-2.6, which don't have
the redistribute option yet. And upgrading all my 1000+ agents is a bit
out of scope atm. I'm also unsure how my servers would react to all the
agents sending the info each time they poll each service (1m interval on
services on the agents).
Anyway, thank you for the tips everyone. But for the time being, I think
I have to modify the server script to be a snap-in replacement.
Anders Synstad
Basefarm AS
_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon