daily to local only, and cleasring bad nfs mounts

Steve Blinkhorn Wed, 15 Jun 2022 04:21:56 -0700

More by chance than from a deep understanding of the issue, I found a
way of restoring sanity when this happens. As superuser:


1. pkill -9 sendmail tee /bin/sh
2. on each server providing nfs service: nfsd -r

Step 1 just speeds everything up - Step 2 might resolve the issue on
its own, but could take quite some time if there is a backlog of
stalled processes.  I went from around 660 processes per affected
server to around 66.  I wish I were clearer about the relationship
between nfsd, mount_nfs and rpcbind, because of the implications of a
server auto-rebooting after, say, a power cut, when there is
significant nfs service between sites.

--
Steve Blinkhorn <[email protected]>

You wrote:
> 
> On Fri, 27 May 2022 at 17:18, Steve Blinkhorn <[email protected]> wrote:
> >
> > 1. How to limit /etc/daily,weekly,monthly so they do not cross nfs mount
> > points?  One of my development systems crashes occasionally when left
> > running a long job after hours.  It reboots itself, but nfs
> > connections to it are not restored.  What I don't notice is that
> > /etc/daily now hangs on a public-facing machine.  Gradually the humber
> > of processes increases day by day until I have numerous find, tee,
> > sendmail and sh proceses all stuck.
> >
> > I can kill some of the /etc/daily related processes, but
> > not the instances of find.  In the past I have been able to resolve
> > the problem by remounting the remote filesystems using mount_nfs, or
> > restarting a crashed rpcbind, but not this time.  BTW, these
> > processes all have a PPID of 1.
> 
> Well one option would be to disable all the finds by setting the
> various find_*=NO in /etc/{daily,weekly,monthly,security}.conf :-p
> Some options have a little more granularity such as find_core_ignore_paths
> 
> It's a pity that the stat() from "find -x" would trigger the nfs mount hang...
> 
> > 2. Attempts to do anything involving mountd, mount or df results in a
> > hung process that kill -9 will not remove.  I need to find a way of
> > restoring normality that is sure-fire, and based on an understanding
> > of nfs clien-side behaviour.  I can, of course, reboot, but this is a
> > customer-facing server in a remote data centre, which otherwise is
> > functioning properly.
> >
> > This is 9.2 on amd64, but I don't belkieve for a moment that this is
> > version-related.
> 
> Does switching between tcp and udp mounts make any difference?
> Would using mount_psshfs possibly be an option?
> 
> David
>

Re: how to limit /etc/daily to local only, and cleasring bad nfs mounts

Reply via email to