We had a network outage on one of our servers this week. The result was that when monit tried it's usual HTTP check to the server's IP, the check failed. Monit then tried to restart apache, but the check still failed. Monit tried a few times and then hit the timeout wall. We have a "if X restarts in X checks, timeout" rule.
I'm realising it's probably a bad thing to be constantly restarting apache, but also probably worse to timeout on a production machine. I'm wondering if there's a way we can timeout for say 15 checks. So instead of unmonitoring the service altogether, just unmonitor it for a while, and then monitor it again and repeat. I guess I could hack something up to this effect using cron and monit status / monit monitor, but wondered if there was such a feature in monit already, or plans to implement something like that. Cheers - Callum. -- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
