Re: Changing the way randstart works (patch)

David Nolan Wed, 04 Dec 2002 08:30:35 -0800

--On Wednesday, December 04, 2002 9:55 AM -0500 Luke Hankins <[EMAIL PROTECTED]> wrote:

Since dependencies that haven't been checked yet are considered
to be not failed for depend checking purposes[1], one needs to be
absolutely sure that all the dependencies are checked before the thing
that depends on them.

I like your approach to handling the issue. I'll just add that the opstatus save/load code that I wrote helps with that situation here. We restart on mon server with new configuration files, automatically generated, reasonably often, and not losing the old state every time is essential to us.

However I do think that the skew should be a percentage. If some test runs every 30 seconds, and some run every minute, I'd like to say 'skew the initial starts by up to 50%' to skew the rapid ones from 15-45 seconds, and
the infrequent ones to 5-15 minutes.

I think the 'unchecked dependencies should count as failures' is also a good idea, but it doesn't really affect us much, because when you're using mon to check things (instead of just schedule tasks) the dependent service may get checked after the real failure happened, but before the real failure was tested, so (in the way we're using Mon here) the dependent service always has alertafter settings. i.e. the ping test runs every 30 seconds, the http test runs once every minute. If it fails twice in a row I *know* the ping test was run inbetween. So an 'alertafter 2' in the appropriate periods will do the right thing.

-David Nolan
Network Software Developer
Computing Services
Carnegie Mellon University

_______________________________________________
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon

Re: Changing the way randstart works (patch)

Reply via email to