A few things. 1 - The user claims they were merely storing the enormous .pst file, not accessing them from Outlook.
2 - The user claimed that any large file bigger than about 4GB would cause the lockup. We haven't been able to replicate it, but he crammed a few 10GB files through this morning and locked up one of our gateways as a demonstration. He has not made my day any brighter. Additional info: WE were unable to reproduce this, but he mentioned that the test was conducted by copying from one AFS directory to another. Additional additional: If I didn't mention it before, this is all going over samba-on-OpenAFS. Yes, I know, users should be using the OpenAFS client rather than going through samba on a gateway. We have found it extremely difficult to get users to adopt this method, however, and have to try to make this work. 3 - I had enabled a 2GB cache bypass, and it seemed to have no effect whatsoever. 4 - I gathered what data I could. Looks like I can't use "crash" without a kernel recompile: This GDB was configured as "x86_64-unknown-linux-gnu"...(no debugging symbols found)... crash: /boot/vmlinuz-2.6.18-194.26.1.el5: no debugging data available cmbdebug said this: [root@rgwb1 ~]# cmdebug localhost Lock afs_discon_lock status: (none_waiting, 21876 read_locks(pid:29278)) [root@rgwb1 ~]# !ps ps -ef | grep 29278 root 29278 4477 0 09:27 ? 00:00:00 smbd root 30101 29337 0 09:37 pts/3 00:00:00 grep 29278 When I ran "top" I saw that the afs_cachetrim process was #1, but presumably wedged. I goosed /proc/sysrq-trigger and as promised, it dumped a lot of call trace info to the syslog. I'm looking through it, but am not sure what to look for. Nothing stands out, anyway. Chris On 3/7/14 3:51 PM, "Andrew Deason" <adea...@sinenomine.net> wrote: >Message: 4 >To: openafs-info@openafs.org >From: Andrew Deason <adea...@sinenomine.net> >Date: Fri, 7 Mar 2014 15:51:23 -0600 >Organization: Sine Nomine Associates >Subject: [OpenAFS] Re: OpenAFS client cache overrun? > >On Fri, 07 Mar 2014 13:51:06 -0500 >Eric Chris Garrison <ecgar...@iu.edu> wrote: > >>I'll have to look for that message from Andrew to gather data if the >>problem crops up again. > >It's this message: > ><http://thread.gmane.org/gmane.comp.file-systems.openafs.general/34517/foc >us=34532> > >The easiest / most basic information to get is just the stack trace from >the daemon that is supposed to be trimming the cache back when it gets >full. That message contains the commands where you can get that >information via the 'crash' tool. > >Or, another way to get that information is by running this: > ># echo t > /proc/sysrq-trigger > >That will generate a ton of information to the kernel log, which you'd >need to sift through or give to someone else. But it's at least a lot >easier to set up and run. > >>Thanks also for the mention of AFS cache bypass, I think that may be a >>BIG help with this problem. > >'Cache bypass' I don't believe is considered the most stable of >features. It could indeed maybe help here, but I'd be looking out for >kernel panics. > >-- >Andrew Deason >adea...@sinenomine.net > _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info