Hi, On Wed, Apr 14, 2010 at 02:26:35AM +0300, luben karavelov wrote: > Hi, > > On 13.04.2010 12:48, Dejan Muhamedagic wrote: > > > >> 2. It seems that /var/lib/nfs/rmtab is not used in NFSv4 so I just deleted > >> the backup/restore > >> procedures. May be it could be a configuration option or runtime check for > >> version 4 that > >> disables these procedures. > >> > > Are you sure it's not used? Isn't that implementation dependent? > > > > > I could not be sure. My tests are on Debian/lenny with kernel 2.6.31. > As I understand, nfs4 does not need rpc.mountd in nfs4, because > mounts are just TCP session. On my setup even with mountd started > the rmtab is not populated when there are only v4 clients > > >> On itself exportfs RA worked as expected, though there was some unexpected > >> interaction between > >> NFSv4 server and the underlying FS. If some NFSv4 clients keep an open > >> file of an exported > >> directory, even if I completely stop the nfsd, there is some timeout > >> before I could umount the > >> underlying FS on the server (XFS here). Meantime I keep getting "device > >> busy error". > >> > > Open files are likely to happen, perhaps there is some nfs > > parameter to reduce the timeout. > > > >> What I have done: In the Filesystem RA I have increased the sleep interval > >> in the "stop" op. Also I > >> have configured 4 minutes timeout for the stop op. May be this sleep value > >> could be configurable: > >> I could imagine other scenarios where tweaking it could be useful. > >> > > Four minutes is quite long. Why would you want to increase the > > sleep interval? To reduce the number of logged messages? > > > > > Filesystem "stop" action tries to umount the filesystem. If it does not > succeed it kills all processes that use the filesystem (nothing in my case > because this are not processes but the kernel that keeps open > filehandles). Then sleeps some interval (1s in the distribution) and tries > again. After the 6th itetation (TERM TERM TERM KILL KILL KILL) the RA > gives up and fails the op. > > I have increased the sleep interval in order to make the RA successfully > stop the resource (umount the FS). The default of 6 seconds in total (6 > interations with 1s sleep) would be quite short on every nfs4 setup. > > On my test setup the filesystem is released for 60-80 sec. I added the > default Filesystem RA stop timeout of 60 sec. And I doubled the interval to > be on the safe side. > > Other possible scenario where tunable "sleep" could be useful: imagine > you have some unmanaged service using the filesystem that needs some > time for proper shutdown (flushing files etc.). In the default setup it > will > be killed with SIGKILL (9) on the 3rd second. If the sleep is a RA parameter > you could give it some more time for proper shutdown.
Looks like we need a better handled timeouts here. Three seconds before killing the process sounds very short. We could make that tunable. Or depend on the timeout set in the CIB. I think that we already have some agents which work like that. Sending TERM signals repeatably also makes little sense. Can you open a bugzilla for this issue so that it doesn't get lost. Thanks, Dejan > > Best regards > Luben > > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
