Paul Smith <[EMAIL PROTECTED]> writes: > Hi all. I'm running Ubuntu 8.04 (kernel 2.6.24-17-generic, autofs 4.1.4 > +debian-2.1ubuntu2) on an Intel Pentium D 3GHz system with > hyperthreading (SMP kernel) and 1G RAM. > > I'm using DHCP for networking and obtaining my automount maps via NIS. > > For the last week or so, almost every morning when I come into work my > system is hung up in a strange way. I can move my mouse but I never get > asked for my password to unlock my screen. I can C-A-F1 etc. to get > back to a console but after I type my username at the login prompt, I > never get asked for a password and then that console is locked up. If I > have a console session already logged in from the day before, then I can > use it for a while but eventually some command will lock hard; can't ^C, > can't ^Z, can't kill -9, nothing. > > If I try to C-A-D to reboot the system starts to come down but then > hangs, hard, trying to bring down automount. Reset just tries to reboot > again and hangs in the same place. I have to power off/on the system > completely. Bummer. > > I did some debugging on this problem. I logged in as root on every > console (F1-F6). The next morning when the system was hung, I found a > command that hung (just "ls") and then I ran it in another console under > strace. > > It turns out what's happening is it's opening /proc/mounts, which > succeeds, then trying to read(2) from it. The read system call never > returns and there's no way to kill that process, at all, once it's in > that state. Also I note the load on the system is very high: typically > over 7. However top shows no processes chewing CPU. I also note that > there are some "duplicate" automount processes running (that is, more > than one for the same map). After I reboot, of course, everything is > fine. > > Last night I started all the consoles and in one of them I wrote a > little shell script that ran `date`, then did cat /proc/mounts, then > slept for 15 seconds, then did it again. I sent the output to a file. > > I found that the hang happened last night at ~22:51 EDT. There was > nothing interesting in the messages log, but in syslog I find a lot of > messages right around that time trying to get to non-existent automount > files (this is caused by some bogosity in the Tracker utility in Gnome, > but it shouldn't cause the system to hang!): > > Jun 2 22:51:29 psmithub automount[29241]: >> mount.nfs: access denied by > server while mounting snap-dev01:/user/.Trash-10490 > Jun 2 22:51:29 psmithub automount[29241]: mount(nfs): nfs: mount failure > snap-dev01:/user/.Trash-10490 on /user/.Trash-10490 > Jun 2 22:51:29 psmithub automount[29241]: failed to mount /user/.Trash-10490 > Jun 2 22:51:29 psmithub automount[29342]: failed to mount /nfs/.Trash > Jun 2 22:51:29 psmithub automount[29343]: failed to mount /nfs/.Trash-10490 > Jun 2 22:51:29 psmithub automount[29344]: failed to mount /mnt/.Trash > Jun 2 22:51:29 psmithub automount[29345]: failed to mount /mnt/.Trash-10490 > Jun 2 22:51:29 psmithub automount[29346]: >> /sbin/showmount: can't get > address for .Trash > Jun 2 22:51:29 psmithub automount[29346]: lookup(program): lookup for .Trash > failed > Jun 2 22:51:29 psmithub automount[29346]: failed to mount /net/.Trash > Jun 2 22:51:29 psmithub automount[29353]: >> /sbin/showmount: can't get > address for .Trash-10490 > Jun 2 22:51:29 psmithub automount[29353]: lookup(program): lookup for > .Trash-10490 failed > Jun 2 22:51:29 psmithub automount[29353]: failed to mount /net/.Trash-10490 > Jun 2 22:51:34 psmithub automount[29212]: mount(nfs): nfs: mount failure > snap-dev01:/tools on /opt/net/tools > Jun 2 22:51:34 psmithub automount[29212]: failed to mount /opt/net/tools >
This .Trash madness has to end. It keep autofs from expiring mounts in other situations. *grumble* My advice for further debugging is to enabled autofs debug logging (see http://people.redhat.com/jmoyer), and when hung, get the output from sysrq-t. So, when you come in in the morning, issue the sysrq-t and make sure you can capture the output somehow (serial console or netconsole would be best). More below... > That's the last message of interest in the syslog. Here's the end of > the shell script loop log: > > Mon Jun 2 22:51:30 EDT 2008 > rootfs / rootfs rw 0 0 > none /sys sysfs rw,nosuid,nodev,noexec 0 0 > none /proc proc rw,nosuid,nodev,noexec 0 0 > udev /dev tmpfs rw,relatime 0 0 > fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 > /dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 / ext3 > rw,relatime,errors=remount-ro,data=ordered 0 0 > /dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 /dev/.static/dev ext3 > rw,relatime,errors=remount-ro,data=ordered 0 0 > tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 > tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 > tmpfs /lib/modules/2.6.24-17-generic/volatile tmpfs rw,relatime 0 0 > tmpfs /dev/shm tmpfs rw,relatime 0 0 > devpts /dev/pts devpts rw,relatime 0 0 > tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 > tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 > /dev/sda5 /home ext3 rw,relatime,data=ordered 0 0 > securityfs /sys/kernel/security securityfs rw,relatime 0 0 > rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 > automount(pid5466) /net autofs > rw,relatime,fd=4,pgrp=5466,timeout=300,minproto=2,maxproto=4,indirect 0 0 > automount(pid5367) /mnt autofs > rw,relatime,fd=4,pgrp=5367,timeout=60,minproto=2,maxproto=4,indirect 0 0 > automount(pid5404) /nfs autofs > rw,relatime,fd=4,pgrp=5404,timeout=3600,minproto=2,maxproto=4,indirect 0 0 > automount(pid5532) /user autofs > rw,relatime,fd=4,pgrp=5532,timeout=300,minproto=2,maxproto=4,indirect 0 0 > automount(pid5612) /export/autofs autofs > rw,relatime,fd=4,pgrp=5612,timeout=60,minproto=2,maxproto=4,indirect 0 0 > automount(pid5684) /opt/net autofs > rw,relatime,fd=4,pgrp=5684,timeout=36000,minproto=2,maxproto=4,indirect 0 0 > nfsd /proc/fs/nfsd nfsd rw,relatime 0 0 Well, I don't see the duplicate entry you mentioned above. It is possible that there are multiple automount daemons for the same mountpoint during a mount or expire event. That's just normal operations. How about your ps listing and maybe a gdb backtrace of the daemon (if your system will allow you to get that). Cheers, Jeff _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs