Greetings Craig We run the latest version of RH7 and also ran into many of the same problems which you've mentioned with earlier versions of OpenAFS 1.8.4 & 1.8.5 - working with our OpenAFS support vendor SineNomine those issues were fixed and incorporated into the recent release of 1.8.6 clients. We've been running 1.8.6 in production for the past 3 weeks for our busy frontend machines (> 100 simultaneous users) without problems now.We build from SRPM. We've been using 1.8.6 on Fedora 32, RH7 and RH8 latest patch levels but testing mostly on RH7. I'm fairly certain 1.8.6 will fix your issue.
Rich On Sat, Aug 8, 2020 at 6:25 AM STRACHAN Craig <craig.strac...@ed.ac.uk> wrote: > Hi, sent this before but from the wrong email address. Apologies if it > pops up twice. > > I was wondering if anyone has seen something like this, or has suggestions > about how I could debug the issue should it happen again. > > We are moving our desktop environment from SL7 to Ubuntu 20.04 LTS. After > a couple of weeks of trouble free performance, on Monday two different > users on different machines (KVM guests if that makes any difference) > suffered problems with cache corruption in their home directories within a > couple of hours of each other. The messages in syslog looked like: > > Aug 3 15:38:52 gazebo kernel: afs: Corrupt directory > (5.536870965.13859.4201870 [inf.ed.ac.uk] @ffffb303425613c8, pos 0) > Aug 3 15:38:52 gazebo kernel: afs: Corrupt directory > (5.536870965.13995.4201950 [inf.ed.ac.uk] @ffffb303423b7ec8, pos 0) > Aug 3 15:38:52 gazebo kernel: afs: Corrupt directory > (5.536870965.13997.4201995 [inf.ed.ac.uk] @ffffb303423b75c8, pos 0) > Aug 3 15:38:52 gazebo kernel: afs: Corrupt directory > (5.536870965.13737.4201771 [inf.ed.ac.uk] @ffffb303423b69c8, pos 0) > > One user also saw input/output errors when trying to access some files. > > There were a number of byte-range locking warnings in both syslogs but > none which referred to anything in the corrupted directories. The effect of > the corruption was the appearance of one or more entries of the form > > -????????? ? ? ? ? ? registrymodifications.xcu > > when doing an ls of the affected directory. Fs flush cleared up all but > one of the issues. This required halting afsd and manually deleting the > cache files to get things working again. > > Both users were very near the upper limits of their quotas when this > happened but there was plenty of space in the file server partition and in > both cache partitions. Both home volumes are on the same server and > partition but there’s no evidence of anything going wrong in the server > logs and none of our SL7 users have reported similar issues. The Ubuntu > machines are running openafs 1.8.4~pre1-1ubuntu2-debian, the server is > running SL7.6, kernel 3.10.0-1062.4.3.el7.x86_64 and > openafs-server-1.8.4-1.el7.x86_64. Fs getcacheparms returns > > AFS using 51% of cache blocks (1068658 of 2097152 1k blocks) > 95% of the cache files (62256 of 65536 files) > afs_cacheFiles: 65536 > IFFree: 3280 > IFEverUsed: 9654 > IFDataMod: 1 > IFDirtyPages: 0 > IFAnyPages: 0 > IFDiscarded: 1 > DCentries: 9998 > 0k- 4K: 9087 > 4k- 16k: 460 > 16k- 64k: 70 > 64k- 256k: 21 > 256k- 1M: 6 > >=1M: 354 > [cache file usage over 90%, consider increasing '-files' argument to afsd] > > on one machine and > > AFS using 29% of cache blocks (1783025 of 6098259 1k blocks) > 3% of the cache files (5900 of 190570 files) > afs_cacheFiles: 190570 > IFFree: 184670 > IFEverUsed: 2270 > IFDataMod: 50 > IFDirtyPages: 0 > IFAnyPages: 0 > IFDiscarded: 0 > DCentries: 9998 > 0k- 4K: 5639 > 4k- 16k: 1638 > 16k- 64k: 606 > 64k- 256k: 308 > 256k- 1M: 262 > >=1M: 1545 > > on the other. > > Does anyone have any idea what might be going on or any further steps I > can take to investigate the problem if it happens again? All suggestions > welcome! > > Thanks in advance, > Craig. > --- > Craig Strachan, Computing Officer, > School of Informatics, University of Edinburgh > > > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > -- Rich Sudlow University of Notre Dame Center for Research Computing - Union Station 506 W. South St South Bend, In 46601 (574) 631-7258 (office) (574) 807-1046 (cell)