[OpenAFS-devel] Re: dealing with rxevent queue stalls

Andrew Deason Tue, 24 Sep 2013 08:29:48 -0700

On Mon, 23 Sep 2013 19:52:37 +0000
Mark Vitale <[email protected]> wrote:


> 1) While accessing a particular fileserver, AFS clients experience
> performance delays; some also see multiple "server down/back up"
> problems.
>   - root cause was a hardware bug on the fileserver that prevented
>   timers from firing reliably; this unpredictably delayed any task in
>   the rxevent queue, while leaving the rest of the fileserver function
>   relatively unaffected.  (btw, this was a pthreaded fileserver).

In my opinion it's not worth it to work around this, unless there's some
way to address it that's easy and everyone agrees it's obviously
correct.

Logging it is definitely helpful and OK, though logging from rx is not
great. Are you currently using existing mechanisms by just printing to
stderr, or some new mechanism for logging from rx?

> 2) Volume releases suffer from poor performance and occasionally fail
> with timeouts.
>   - root cause was heavier-than-normal vlserver load (perhaps caused
>   by disk performance slowdowns); this starved LWP IOMGR, which in
>   turn prevented LWP rx_Listener from being dispatched (priority
>   inversion), leading to a grossly delayed rxevent queue.

I'm not sure if I'm mistaken as to what this is about, or if I just find
this phrasing really confusing. I thought the issue here was just that
ubik proceses (such as vlserver) use plain read() and write() calls to
read and write from disk; so if they take a while, all LWPs will freeze
because we cannot preempt the LWP waiting on i/o. Is that correct?

-- 
Andrew Deason
[email protected]

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

[OpenAFS-devel] Re: dealing with rxevent queue stalls

Reply via email to