On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote: > J. Bruce Fields wrote: >> Remind me: why do we need both per-ip and per-filesystem methods? In >> practice, I assume that we'll always do *both*? >> > > Failover normally is done via virtual IP address - so per-ip base method > should be the core routine. However, for non-cluster filesystem such as > ext3/4, changing server also implies umount. If there are clients not > following rule and obtaining locks via different ip interfaces, umount > would fail that ends up aborting the failover process. That's the place > we need the per-filesystem method. > > ServerA: > 1. Tear down the IP address > 2. Unexport the path > 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files > 4. If unmount required, > write path name to /proc/fs/nfsd/unlock_filesystem, then unmount. > 5. Signal peer to begin take-over. > > Sometime ago we were looking at "export name" as the core method (so > per-filesystem method is a subset of that). Unfortunately, the prototype > efforts showed the code would be too intrusive (if filesystem sub-tree > is exported). >> We're migrating clients by moving a server ip address from one node to >> another. And I assume we're permitting at most one node to export each >> filesystem at a time. So it *should* be the case that the set of locks >> held on the filesystem(s) that are moving are the same as the set of >> locks held by the virtual ip that is moving. >> > > This is true for non-cluster filesystem. But a cluster filesystem can be > exported from multiple servers. >> But presumably in some scenarios clients can get confused, and we need >> to ensure that stale locks are not left behind? >> > > Yes. > >> We've discussed this before, but we should get the answer into comments >> in the code (or on the patches). >> >> > ok, working on it. or should we add something into linux/Documentation > to describe the overall logic ?
Yeah, sounds good. Maybe under Documentation/filesystems? And it might also be helpful to leave a reference to it in the code, e.g., in nfsctl.c: /* * The following are used for failover; see * Documentation/filesystems/nfsd-failover.txt for details. */ --b.