hi.

we are using mon to alert us of problems on a couple of our server pools.

we are finding 2 issues with it, and was wondering if anyone has come across this, and have workarounds for them.


*1*
when upgrading a server pool of ~30 machines we get multiple failure alerts.
We belive that this is due to a individual machine being down for a check (and then back up again) and then a seperate machine being down
the next check (and so on). This is confusing mon and it thinks that it
failing.


My plan on resolving this was to store failure modes at the host level inside the service, and only send out alerts when a individual host has been down for the failure time. any idea on how mon could do this?

*2*
machine flapping.
sometimes we have machines coming in & out of service regularly. we only want to be notified once an hour, but mon seems to clear the last-alert sent when the service becomes good again



my plan on this was to somehow check if an email was sent in the alerting period already for that machine and not send it.



also.. what do people think of a new type of group/alerting mechanism.. a 'load-balanced' group where you can get alerts when X% of machines in the group are not responding

(does anyone know of a GPL/BSD package out there which works like mon, and does the above already?)

Regards
Ian.


_______________________________________________ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon

Reply via email to