Re: [OpenAFS-devel] dealing with rxevent queue stalls

Mark Vitale Wed, 25 Sep 2013 15:07:18 -0700

On Sep 24, 2013, at 8:02 AM, Derrick Brashear <[email protected]> wrote:
> I haven't profiled rxevent queue handling in about 5 years. With your code 
> does the queue appear "sick" on a normally functioning host if you assume 
> that threshold is 0 (if now is later than the top scheduled event, assume it 
> should have already fired)?


No, I've currently got a #define for a 5-second "grace" period.  In all the 
cores I've examined, the difference between "now" and the top epoch is always 
about zero for active fileservers, and usually a positive number (1-60 secs) 
for volservers and DB servers.   The only negative values I've ever observed 
were approx -4200 secs (!!!) for the broken timer case, and -17 secs for the 
LWP priority inversion case.  In my simulated testing, the 5-second grace 
period was sufficient to never flag a working server as sick, but it correctly 
flags simulated "sick" servers fairly quickly.  So the threshold I've been 
testing with is essentially -5 seconds; but it's implemented in such a way as 
to provide some hysteresis so that the server's not marked "well" until the 
difference is the normal 0 seconds.  

--
Mark Vitale
[email protected]

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OpenAFS-devel] dealing with rxevent queue stalls

Reply via email to