On Mon, 23 Sep 2013 19:52:37 +0000 Mark Vitale <[email protected]> wrote:
> 1) While accessing a particular fileserver, AFS clients experience > performance delays; some also see multiple "server down/back up" > problems. > - root cause was a hardware bug on the fileserver that prevented > timers from firing reliably; this unpredictably delayed any task in > the rxevent queue, while leaving the rest of the fileserver function > relatively unaffected. (btw, this was a pthreaded fileserver). In my opinion it's not worth it to work around this, unless there's some way to address it that's easy and everyone agrees it's obviously correct. Logging it is definitely helpful and OK, though logging from rx is not great. Are you currently using existing mechanisms by just printing to stderr, or some new mechanism for logging from rx? > 2) Volume releases suffer from poor performance and occasionally fail > with timeouts. > - root cause was heavier-than-normal vlserver load (perhaps caused > by disk performance slowdowns); this starved LWP IOMGR, which in > turn prevented LWP rx_Listener from being dispatched (priority > inversion), leading to a grossly delayed rxevent queue. I'm not sure if I'm mistaken as to what this is about, or if I just find this phrasing really confusing. I thought the issue here was just that ubik proceses (such as vlserver) use plain read() and write() calls to read and write from disk; so if they take a while, all LWPs will freeze because we cannot preempt the LWP waiting on i/o. Is that correct? -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
