Hi all,

So, if you are not yet aware, the way we implement mountpoints on Linux
right now is not great. The current approaches have been known to be
iffy, but right now I think we've come to the point where we have
unavoidable easy-to-reproduce panics and/or deadlocks with the current
approach, so we need to change something. Gerrit 7600 and 7601 may give
a good impression of what's going on here; that was my first attempt at
a workaround fix, but it doesn't really work.

I've been discussing this with a few people, and I think Jeff Hutzelman
provided a good overview along with some options. With his permission,
this is reproduced below. I have some comments to go along with this, to
say what the various user-visible differences are and pros/cons, etc,
but I want to post it here by itself first, so it's easier to read.

> Unfortunately, what it boils down to is that the Linux kernel
> architecture assumes that a filesystem is a tree (that is, a
> connected, acyclic graph), and is incapable of correctly handling a
> filesystem like AFS which is in fact a directed graph with some
> restrictions(*).  This is their failing, not ours, and it is not
> limited to AFS.
> 
> The best we can do is come up with an internally-consistent mapping of
> the AFS filesystem onto a (possibly mutating) tree, and use that tree
> as Linux's view of the filesystem.  Mostly, this presents two
> problems:
> 
> 1) what do we do with cycles?
> 2) what do we do with nodes with multiple incoming edges?
> 
> Of course, these are essentially the same question, and I can think of
> several possible answers:
> 
>      A. Duplicate the entire subtree starting at the multiply-mounted
>         volume.  This means that changing one copy would have to result
>         in changing the other copies as well, and of course a change
>         from the server would have to be reflected in every copy.  That
>         means that the vnode->inode mapping would go from a simple
>         pointer to a list, and vnodes would require true reference
>         counting, independent of the refcount on any associated inode.
>         We might have to go do some effort to avoid cycles, just to
>         maintain sanity.  And, the fixed mapping between vnodes and
>         inode numbers probably goes right out the window.
>      B. Reparent the multiply-mounted volume each time it is accessed
>         via a new path.  This is what we've done since the
>         multiple-alias problem first arose.  It actually works fairly
>         well for users, but at times has been a pain to make work.
>         Cycles don't work, of course, since reparenting a volume below a
>         cycle would orphan the whole subtree, and the kernel VFS layer
>         won't let you do that (mostly).
>      C. Pretend like multiple mounts aren't allowed, and simply refuse
>         to follow additional mount points into a volume that already has
>         an associated dentry.  Users would not like this.
>      D. Treat every volume as a separate filesystem, like kafs does.
>         While this has some advantages, it also has some serious
>         disadvantages.  I also have a vague recollection of coming up
>         with a reason at one point why this model is fatally flawed.
>      E. Present additional mount points to the same volume as symbolic
>         links.  If I recall correctly, it is even possible to present
>         them as symlinks where the results of readlink(2) are not
>         actually consistent with what happens if you traverse the link,
>         so we need not be able to construct a path to the original mount
>         point (though of course we can, if it is still in the dentry
>         tree).
>      F. Present _all_ mount points as symbolic links, pointing at paths
>         in /afs/.:mount.
> 
> 
> (*) Most notably, only volume roots can actually have more than one
> incoming edge.

-- 
Andrew Deason
[email protected]

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to