On Fri, 29 Jun 2012 16:16:15 -0500 Andrew Deason <[email protected]> wrote:
>> A. Duplicate the entire subtree starting at the multiply-mounted >> volume. >From a user perspective, this option I think has no downsides; it has no theoretical user-visible problems or limitations that the others have. However, I think this is also the most complex option to implement, and it consumes more memory. Possibly not a lot of memory; the actual AFS structures could probably be shared and we'd just need a layer on top of AFS vcaches that map linux inodes to AFS vcaches etc. However, as jhutz notes, this means that we now have multiple Linux inodes per file. So, for example, everywhere in the code that uses AFSTOV would need to be changed to loop through a list of values. At the very least that is a lot of work; for some call sites, it may require nontrivial restructuring of code to make work (and this is platform-independent code, so we may break other platforms trying to fix this). So, due to the amount of work, at the very least I don't think this is a short-term solution. But maybe it is the best thing to do... >> B. Reparent the multiply-mounted volume each time it is accessed >> via a new path. As I've mentioned, this has problems. The details are a bit much to do into right here, but briefly... we're not allowed to reparent something while it's in use. So if we need to do that to perform an rmdir() or a rename(), we get kinda stuck and have to return a bogus answer to Linux. In some places, sanity checks will make us panic, and in others, assumptions from the Linux code can cause us to deadlock. >> C. Pretend like multiple mounts aren't allowed, [...] Users would >> not like this. Yeah. >> D. Treat every volume as a separate filesystem, like kafs does. One of the disadvantages of this is that users can no longer 'mv' mountpoints around; you need to have AFS knowledge to manipulate them. Another issue is really more of an obstacle in the implementation, but the interfaces to perform mounts and create new filesystems from within the kernel are GPLONLY, so we can't use them. It is possible to make afsd perform the mount from userspace (like the afsdb handler, and OS X's userspace move helper), and in fact I have done a little work into doing that, to see how well it can function. I believe this can work, but it does require quite a bit more effort, and it seems pretty error prone. I think there are also two sub-options here, which is whether we bind-mount /afs/foo/bar to /afs/.:mount, or if we mount /afs/foo/bar as AFS with some special option to mount a certain volume. The former is what I was working on, just for ease of implementation; I'm not sure if there's much of a practical difference between the two. I feel like there are other disadvantages here; I don't think I covered jhutz's reservations. There may be some performance concerns here, too, once we start to access a lot of volumes at once, but I'm not sure how much of a problem that is. >> E. Present additional mount points to the same volume as symbolic >> links. [...] >> F. Present _all_ mount points as symbolic links, pointing at paths >> in /afs/.:mount. I think presenting these as actual symlinks is a no-go, since it's quite a big user-visible change. With F, this also makes '..' no longer work "correctly" ever. With E, it makes '..' not work correctly when there are multiple mount points; I think that's more acceptable, since I don't think they've ever worked correctly 100% of the time on Linux. However, it is possible on Linux to have a directory, but give it a follow_link function, so it behaves kinda like a symlink in that it gets dereferenced before being accessed, and lets you point at an arbitrary dentry like a regular symlink. kafs does this, and just mounts the volume on the dir as the dereferencing operation. I think it may work pretty well to have the first mtpt access appear as a normal dir. Then any other mountpoints to the same volume appear as dirs, but they dereference to to the first mountpoint. So, option E, but they appear to be dirs to user applications. '..' works for the cases where you only access through a single mountpoint, and for other accesses, it 'breaks' by pointing you to that original mountpoint. I've been looking a bit today at implementing this approach; it seems doable with the only non-Linux changes being a couple of small interface changes to afs_lookup. I welcome any comments or thoughts on this general subject or any particulars up there, or attempts at any implementations. Especially from people with more Linux VFS internals experience than me :) -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
