On Sep 24, 2013, at 8:02 AM, Derrick Brashear <[email protected]> wrote: > I haven't profiled rxevent queue handling in about 5 years. With your code > does the queue appear "sick" on a normally functioning host if you assume > that threshold is 0 (if now is later than the top scheduled event, assume it > should have already fired)?
No, I've currently got a #define for a 5-second "grace" period. In all the cores I've examined, the difference between "now" and the top epoch is always about zero for active fileservers, and usually a positive number (1-60 secs) for volservers and DB servers. The only negative values I've ever observed were approx -4200 secs (!!!) for the broken timer case, and -17 secs for the LWP priority inversion case. In my simulated testing, the 5-second grace period was sufficient to never flag a working server as sick, but it correctly flags simulated "sick" servers fairly quickly. So the threshold I've been testing with is essentially -5 seconds; but it's implemented in such a way as to provide some hysteresis so that the server's not marked "well" until the difference is the normal 0 seconds. -- Mark Vitale [email protected]
signature.asc
Description: Message signed with OpenPGP using GPGMail
