More by chance than from a deep understanding of the issue, I found a way of restoring sanity when this happens. As superuser:
1. pkill -9 sendmail tee /bin/sh 2. on each server providing nfs service: nfsd -r Step 1 just speeds everything up - Step 2 might resolve the issue on its own, but could take quite some time if there is a backlog of stalled processes. I went from around 660 processes per affected server to around 66. I wish I were clearer about the relationship between nfsd, mount_nfs and rpcbind, because of the implications of a server auto-rebooting after, say, a power cut, when there is significant nfs service between sites. -- Steve Blinkhorn <st...@prd.co.uk> You wrote: > > On Fri, 27 May 2022 at 17:18, Steve Blinkhorn <st...@prd.co.uk> wrote: > > > > 1. How to limit /etc/daily,weekly,monthly so they do not cross nfs mount > > points? One of my development systems crashes occasionally when left > > running a long job after hours. It reboots itself, but nfs > > connections to it are not restored. What I don't notice is that > > /etc/daily now hangs on a public-facing machine. Gradually the humber > > of processes increases day by day until I have numerous find, tee, > > sendmail and sh proceses all stuck. > > > > I can kill some of the /etc/daily related processes, but > > not the instances of find. In the past I have been able to resolve > > the problem by remounting the remote filesystems using mount_nfs, or > > restarting a crashed rpcbind, but not this time. BTW, these > > processes all have a PPID of 1. > > Well one option would be to disable all the finds by setting the > various find_*=NO in /etc/{daily,weekly,monthly,security}.conf :-p > Some options have a little more granularity such as find_core_ignore_paths > > It's a pity that the stat() from "find -x" would trigger the nfs mount hang... > > > 2. Attempts to do anything involving mountd, mount or df results in a > > hung process that kill -9 will not remove. I need to find a way of > > restoring normality that is sure-fire, and based on an understanding > > of nfs clien-side behaviour. I can, of course, reboot, but this is a > > customer-facing server in a remote data centre, which otherwise is > > functioning properly. > > > > This is 9.2 on amd64, but I don't belkieve for a moment that this is > > version-related. > > Does switching between tcp and udp mounts make any difference? > Would using mount_psshfs possibly be an option? > > David >