We're in the process of transitioning from OpenAFS 1.2.11 to 1.4.7. Both systems are running Debian, though the earlier server runs Debian 3 while the new one Debian 5. Many of these new server hosts are actually Xen instances, though not all on the same physical server. I assume that AFS servers running under Xen should be perfectly OK.

New db and file servers running 1.4.7 are in place. I've migrated all volumes to the new server as well as started two secondary db servers running 1.4.7. I have the old main db and file server (lowest IP) and a secondary backup server (with LTO tape drive) running an empty file server both running 1.2.11.

Everything worked just great for a week or so. But now I'm seeing tremendously slow operations when conducting any AFS administrative operation. Do a vos create and it takes minutes to create the volume. Nighly backups have now started taking hours to complete when they used to be done within twenty minutes or so. And most disturbingly, when I try mounting a volume I successfully created, it reports success. But if I then try to cd into the volume or access it in any way I get the error message:

fs3:/afs/.lns.mit.edu/user# vos listvol afs3 vicepj
Total number of volumes on server afs3 partition /vicepj: 66
test                              536876681 RW          2 K On-line
[...]

afs3:/afs/.lns.mit.edu/public# fs mkmount test test
afs3:/afs/.lns.mit.edu/public# ls test
ls: cannot access test: No such device
afs3:/afs/.lns.mit.edu/public#

afsdbserv1:/var/lib/openafs/db# udebug afs2 7002
Host's addresses are: ***.***.***.134
Host's ***.***.***.134 time is Wed Jul 29 17:18:06 2009
Local time is Wed Jul 29 17:18:09 2009 (time differential 3 secs)
Last yes vote for ***.***.***.134 was 3 secs ago (sync site);
Last vote started 3 secs ago (at Wed Jul 29 17:18:06 2009)
Local db version is 1248882567.2
I am sync site until 56 secs from now (at Wed Jul 29 17:19:05 2009) (3 servers)
Recovery state 1f
Sync site's db version is 1248882567.2
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
         19719 secs ago (at Wed Jul 29 11:49:30 2009)

Server (***.***.***.218): (db 1248882567.2)
   last vote rcvd 4 secs ago (at Wed Jul 29 17:18:05 2009),
last beacon sent 3 secs ago (at Wed Jul 29 17:18:06 2009), last vote was yes
   dbcurrent=1, up=1 beaconSince=1

Server (***.***.***.217): (db 1248882567.2)
   last vote rcvd 4 secs ago (at Wed Jul 29 17:18:05 2009),
last beacon sent 3 secs ago (at Wed Jul 29 17:18:06 2009), last vote was yes
   dbcurrent=1, up=1 beaconSince=1

port 7003 says pretty much the same thing.

Is my problem the difference between openafs 1.2.11 vs. 1.4.7 or do I have a deeper problem going on here? There are no clock skew issues between any of the servers.

Thanks a bunch for any suggestions.

J. Maynard Gelinas
Computer Services Manager
24-030d
617-253-5222
geli...@mit.edu

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to