Hello, In our labs we have got a small AFS+Kerberos cell of twenty-odd workstations and one server, all of which run Debian Linux (kernel 2.4.27) and its packaged OpenAFS (version 1.2.11 AFAIR). Basically, the system works: users can log in from everywhere they are allowed to and still gain access to their AFS home directories. However, all the AFS clients, seemingly including the one running on the server itself, tend to either temporarily lock up or lose connection to the server once in a while.
The problem seems to be related with write operation on AFS, as the apparent trigger of this behaviour is increased write activity such as starting up a complex window manager (e.g. the one from KDE), a large application (e.g. OpenOffice), downloading a large file or, in case of the server (rarely, but nevertheless possibly), a large number of users being active at the same time. When that happens, whatever app caused the problem just sits there waiting for a while, then (usually, depending on the app) times out the file operation. On workstations the situation is usually accompanied by two messages, one after another, from afsd on the console stating that both IPs of the server have gone down; on the server there is no such message. After a couple of minutes the connection is restored (if that indeed is the problem, but that's what afsd says on workstations) and everything works as before (until next time, that is), but by then the I/O time-outs have usually kicked in and whatever app it was that triggered the problem has already died. Other potentially useful bits of information (if you need to know anything more, by all means ask): - neither low-intensity read/write activity (for sure) nor high-intensity read activity (AFAIR) trigger the problem: it is for instance possible to log in in text mode as many times as one wants. Of course on workstations that doesn't apply to the period when the connection has already been declared down, as during that time even shell logon sets one's $HOME to / due to the real home dir being inaccessible. - decreasing the cache size on clients have made the problem less frequent, but it didn't go away I have worked as a user in much larger AFS cells and have never experienced such a behaviour, so obviously something is wrong here. Like I said, ask if you need any more information. Help will me appreciated! Best regards, -- Marek Szuba _______________________________________________ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
