> > We belive that this is due to a individual machine being down for a > > check (and then back up again) and then a seperate machine > > being down > > the next check (and so on). This is confusing mon and it > > thinks that it > > failing. > > > > My plan on resolving this was to store failure modes at the host level > > inside the service, and only send out alerts when a individual host has > > been down for the failure time. any idea on how mon could do this? > > This is confusing you not mon. If a host fails the group fails. > If you don't want that consider one group per host. >
One trouble in creating one group per host/service is the shear number of groups you end up with. You also specify alertafter/alertevery,etc at the host/service level. If you specify 'alertafter 2 30m', service b should not alert after one failure just because service a failed one time 15 minutes ago. Because of these, I would have to agree with the original poster that failures should be tracked at the service/host level, and not the group level. Nicholas Cook _______________________________________________ mon mailing list [EMAIL PROTECTED] http://linux.kernel.org/mailman/listinfo/mon