Hi all, So, if you are not yet aware, the way we implement mountpoints on Linux right now is not great. The current approaches have been known to be iffy, but right now I think we've come to the point where we have unavoidable easy-to-reproduce panics and/or deadlocks with the current approach, so we need to change something. Gerrit 7600 and 7601 may give a good impression of what's going on here; that was my first attempt at a workaround fix, but it doesn't really work.
I've been discussing this with a few people, and I think Jeff Hutzelman provided a good overview along with some options. With his permission, this is reproduced below. I have some comments to go along with this, to say what the various user-visible differences are and pros/cons, etc, but I want to post it here by itself first, so it's easier to read. > Unfortunately, what it boils down to is that the Linux kernel > architecture assumes that a filesystem is a tree (that is, a > connected, acyclic graph), and is incapable of correctly handling a > filesystem like AFS which is in fact a directed graph with some > restrictions(*). This is their failing, not ours, and it is not > limited to AFS. > > The best we can do is come up with an internally-consistent mapping of > the AFS filesystem onto a (possibly mutating) tree, and use that tree > as Linux's view of the filesystem. Mostly, this presents two > problems: > > 1) what do we do with cycles? > 2) what do we do with nodes with multiple incoming edges? > > Of course, these are essentially the same question, and I can think of > several possible answers: > > A. Duplicate the entire subtree starting at the multiply-mounted > volume. This means that changing one copy would have to result > in changing the other copies as well, and of course a change > from the server would have to be reflected in every copy. That > means that the vnode->inode mapping would go from a simple > pointer to a list, and vnodes would require true reference > counting, independent of the refcount on any associated inode. > We might have to go do some effort to avoid cycles, just to > maintain sanity. And, the fixed mapping between vnodes and > inode numbers probably goes right out the window. > B. Reparent the multiply-mounted volume each time it is accessed > via a new path. This is what we've done since the > multiple-alias problem first arose. It actually works fairly > well for users, but at times has been a pain to make work. > Cycles don't work, of course, since reparenting a volume below a > cycle would orphan the whole subtree, and the kernel VFS layer > won't let you do that (mostly). > C. Pretend like multiple mounts aren't allowed, and simply refuse > to follow additional mount points into a volume that already has > an associated dentry. Users would not like this. > D. Treat every volume as a separate filesystem, like kafs does. > While this has some advantages, it also has some serious > disadvantages. I also have a vague recollection of coming up > with a reason at one point why this model is fatally flawed. > E. Present additional mount points to the same volume as symbolic > links. If I recall correctly, it is even possible to present > them as symlinks where the results of readlink(2) are not > actually consistent with what happens if you traverse the link, > so we need not be able to construct a path to the original mount > point (though of course we can, if it is still in the dentry > tree). > F. Present _all_ mount points as symbolic links, pointing at paths > in /afs/.:mount. > > > (*) Most notably, only volume roots can actually have more than one > incoming edge. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
