Re: [OpenAFS] Re: Request for testing: NATs and 1.6.6pre*

2013-12-24 Thread chas williams - CONTRACTOR
Firefox does like to cache things in your profile directory (images, web
pages and such) and over a WAN this might not have the desired effect.
In some cases, it might be faster to simply retrieve these items again
from the Internet instead of going out over the WAN back to your
profile in AFS.

On Sat, 21 Dec 2013 00:17:05 +0200 (EET)
"Jukka Tuominen"  wrote:

> 
> These hangs can last 10+ seconds over WAN, but not quite a minute at least
> today. However, when I straced firefox, there are indications that a
> missing /etc/ld.so.nohwcap file and installed "preload" package may be
> causing at least part of the problem. Maybe the system is trying speed up
> things by loading files to memory, but the loaded files are not local. I
> need to study this a bit further.
> 
> br, jukka
> 
> 
> >> On Thu, 19 Dec 2013 18:28:58 -0600
> >> Andrew Deason  wrote:
> >>
> >> > But how do you know if this is a problem for you at all? Usually the
> >> > most user-visible symptom is that access to AFS hangs while a client
> >> > is trying to write to AFS, but a lot of different things can cause
> >> > that.
> >
> > I should have said "hangs for about a minute". Which for some users may
> > be indistinguishable from "forever" :)
> >
> >
> > On Fri, 20 Dec 2013 07:29:38 +0200 (EET)
> > "Jukka Tuominen"  wrote:
> >
> >> Even though things work nicely usability-wise (just boot and log-in
> >> graphically), I still think it should have a bit smoother two-way data
> >> transfer behind the scene. Applications like Firefox like to write
> >> constantly something to a homedir which happens to be on a server. This
> >> sometimes freezes the application momentarily, even though the amount of
> >> data transferred is still modest.
> >>
> >> If you think this is the kind of configuration you're interested, and
> >> you can provide a patch file that works on top of this, I could try to
> >> test it during the weekend.
> >
> > That issue is probably not relevant; the hangs/freezes I'm talking about
> > are usually longer, for more like a minute. Though, if you find anything
> > in FileLog that looks like what I mentioned, then yes, the general
> > environment may be of interest. Let me know?
> >
> > I assume by a momentary "freeze" you mean just for a second or so. That
> > maybe has more to do with firefox not expecting some operations like
> > close() to take longer than expected. If you wanted to look into that
> > (outside of this thread), I would try maybe running firefox under
> > 'strace -o /some/file.dmp -tt -T', to see which system calls are taking
> > a long time. If you can provide that, as well as which specific times
> > you see a "freeze", that could be used to provide an explanation of
> > what's going on. (If you manage to record that, send it to openafs-bugs
> > or directly to me or something; probably not to the list.)
> >
> > --
> > Andrew Deason
> > adea...@sinenomine.net
> >
> > ___
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> >
> 
> 
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Connection timed out and device doesn't exist finally solved

2013-12-24 Thread Timothy Balcer
Very very odd behavior. To put it in short.. an entire fileserver's RW
volumes became unavailable to our colo sites, but not the local site. Every
effort to determine the cause was met with frustration (all sorts of
cachemanager operations yielded nothing)

That is, until I did an fs whereis on the affected volume, on the
fileserver machine itself...

It told me the RW volume was available on host 192.168.122.1. Formerly a
virtual host bridge interface, but no longer used.

VLDB did not show this.. syncserv and syncvldb's had not fixed the problem.
Restarting the fileserver process did not release it, even though the IP
was no longer active.

So I moved one volume. That worked. But I didn't want to do that for the
entire fileserver.

So I entered -rxbind to the fileserver process and restarted it.

Voila. Problem solved.

-- 
Timothy Balcer / IT Services
Telmate / San Francisco, CA
Direct / (415) 300-4313
Customer Service / (800) 205-5510