Hi,

On Wed, Apr 14, 2010 at 02:26:35AM +0300, luben karavelov wrote:
> Hi,
> 
> On 13.04.2010 12:48, Dejan Muhamedagic wrote:
> >
> >> 2. It seems that /var/lib/nfs/rmtab is not used in NFSv4 so I just deleted
> >> the backup/restore
> >> procedures. May be it could be a configuration option or runtime check for
> >> version 4 that
> >> disables these procedures.
> >>      
> > Are you sure it's not used? Isn't that implementation dependent?
> >
> >    
> I could not be sure. My tests are on Debian/lenny with kernel 2.6.31.
> As I understand, nfs4 does not need rpc.mountd in nfs4, because
> mounts are just TCP session. On my setup even with mountd started
> the rmtab is not populated when there are only v4 clients
> 
> >> On itself exportfs RA worked as expected, though there was some unexpected
> >> interaction between
> >> NFSv4 server and the underlying FS. If some NFSv4 clients keep an open
> >> file of an exported
> >> directory,  even if I completely stop the nfsd, there is some timeout
> >> before I could umount the
> >> underlying FS on the server (XFS here). Meantime I keep getting "device
> >> busy error".
> >>      
> > Open files are likely to happen, perhaps there is some nfs
> > parameter to reduce the timeout.
> >    
> >> What I have done: In the Filesystem RA I have increased the sleep interval
> >> in the "stop" op. Also I
> >> have configured 4 minutes timeout for the stop op. May be this sleep value
> >> could be configurable:
> >> I could imagine other scenarios where tweaking it could be useful.
> >>      
> > Four minutes is quite long. Why would you want to increase the
> > sleep interval? To reduce the number of logged messages?
> >
> >    
> Filesystem "stop" action tries to umount the filesystem. If it does not
> succeed it kills all processes that use the filesystem (nothing in my case
> because this are not processes but the kernel that keeps open
> filehandles). Then sleeps some interval (1s in the distribution) and tries
> again. After the 6th itetation (TERM TERM TERM KILL KILL KILL) the RA
> gives up and fails the op.
> 
> I have increased the sleep interval in order to make the RA successfully
> stop the resource (umount the FS). The default of 6 seconds in total (6
> interations with 1s sleep) would be quite short on every nfs4 setup.
> 
> On my test setup the filesystem is released for 60-80 sec. I added the
> default Filesystem RA stop timeout of 60 sec. And I doubled the interval to
> be on the safe side.
> 
> Other possible scenario where tunable "sleep" could be useful: imagine
> you  have some unmanaged service using the filesystem that needs some
> time for proper shutdown (flushing files etc.). In the default setup it 
> will
> be killed with SIGKILL (9) on the 3rd second. If the sleep is a RA parameter
> you could give it some more time for proper shutdown.

Looks like we need a better handled timeouts here. Three seconds
before killing the process sounds very short. We could make that
tunable. Or depend on the timeout set in the CIB. I think that we
already have some agents which work like that. Sending TERM
signals repeatably also makes little sense.

Can you open a bugzilla for this issue so that it doesn't get
lost.

Thanks,

Dejan

> 
> Best regards
> Luben
> 
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to