Re: FS_SINGLE queries
Alexander Viro writes: > On Sat, 10 Jun 2000, Richard Gooch wrote: > > Will it really make much difference? What would be harder to do > > without mount IDs? And how much harder? > > Beware of functions with many arguments... Besides, what about "kill > the component of union-mount on /barf NFS-mounted from > venus:/foo/bar"? What exactly are you going to pass here? Such > stuff is better left to userland. Let's see. Pass the same stuff you see in /proc/namespace? Instead of cut-and-paste of the mount ID, cut-and-paste the other entries on that line. You're making the decision based on what's in /proc/namespace anyway. Why add another level of indirection? > > > And then... consider the situation when root logs in and decides to > > > mess with luser's namespace. > > > > What about it? > > bastard@venus% su - > Password: > root@venus% w luser > > luser pts/0 ... > root@venus% ps t pts/0 > > > 728 pts/0 ... > root@venus% cat /proc/728/ns > > > 123749/home/luser/foo / nfs > root@venus% umount -I 123749 > root@venus% logout > bastard@venus% mail luser > Subject: you've been told to umount ~/foo > > ^D > > > Avoiding numbers is a good thing. They have no intrinsic meaning. > > Tell that to guys who invented file descriptors. IMO that works > quite fine - I'll prefer to do close(17) rather than incantations of horror needed on OS/360> type that stuff late on Saturday> But FDs are different. You reference the file by name, and the system returns an opaque handle. But the system isn't returning a mount ID as the result of some operation: you had to scan a file for it. So really your handle to the mount is something else, the mount ID is merely another level of indirection. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: FS_SINGLE queries
On Sat, 10 Jun 2000, Richard Gooch wrote: > Will it really make much difference? What would be harder to do > without mount IDs? And how much harder? Beware of functions with many arguments... Besides, what about "kill the component of union-mount on /barf NFS-mounted from venus:/foo/bar"? What exactly are you going to pass here? Such stuff is better left to userland. > > And then... consider the situation when root logs in and decides to > > mess with luser's namespace. > > What about it? bastard@venus% su - Password: root@venus% w luser luser pts/0 ... root@venus% ps t pts/0 728 pts/0 ... root@venus% cat /proc/728/ns 123749 /home/luser/foo / nfs root@venus% umount -I 123749 root@venus% logout bastard@venus% mail luser Subject: you've been told to umount ~/foo ^D > Avoiding numbers is a good thing. They have no intrinsic meaning. Tell that to guys who invented file descriptors. IMO that works quite fine - I'll prefer to do close(17) rather than
Re: FS_SINGLE queries
Alexander Viro writes: > > > On Sat, 10 Jun 2000, Richard Gooch wrote: > > > Yeah, sure. I did say "for example". Your format looks fine. One > > question: is the mount ID really needed? Can't you distinguish based > > on what FS you're mounting (and mountpoint root)? > > First of all, interface is simpler that way. Will it really make much difference? What would be harder to do without mount IDs? And how much harder? > And then... consider the situation when root logs in and decides to > mess with luser's namespace. What about it? Avoiding numbers is a good thing. They have no intrinsic meaning. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: FS_SINGLE queries
On Sat, 10 Jun 2000, Richard Gooch wrote: > Yeah, sure. I did say "for example". Your format looks fine. One > question: is the mount ID really needed? Can't you distinguish based > on what FS you're mounting (and mountpoint root)? First of all, interface is simpler that way. And then... consider the situation when root logs in and decides to mess with luser's namespace.
Re: FS_SINGLE queries
Alexander Viro writes: > > > On Sat, 10 Jun 2000, Richard Gooch wrote: > > > What I mean by "real" mounts is a table that shows how each FS was > > brought into the namespace (or each namespace, once you implement > > CLONE_NEWNS). So for example: > > #device filesystem roots > > /dev/hda1 ext2/ > > /dev/hda2 ext2/var/spool/mail /gaol/var/spool/mail > > noneproc/proc /gaol/proc > > Bad format. If anything, it should contain mount IDs (if you want to have > union-mount you need those, just to be able to take away components). > The following might go: > > 1 / / ext2/dev/hda1 > 2 /var/spool/mail / ext2/dev/hda2 > 3 /proc / procfs > 14/gaol/var/spool/mail/ ext2/dev/hda2 > 15/gaol/proc / procfs > 42/gaol/lib/libc.2.1.3.so /lib/libc.2.1.3.so ext2/dev/hda1 > ... > > IOW, ID + mountpoint + location of root in its tree + fs type + > fs-specific parameters. That at least allows to reproduce the > namespace. And yes, IMO "device" is fs-specific parameter. Yeah, sure. I did say "for example". Your format looks fine. One question: is the mount ID really needed? Can't you distinguish based on what FS you're mounting (and mountpoint root)? BTW: I agree that device is fs-specific. It's much nicer to see "none" done away with. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: FS_SINGLE queries
On Sat, 10 Jun 2000, Richard Gooch wrote: > What I mean by "real" mounts is a table that shows how each FS was > brought into the namespace (or each namespace, once you implement > CLONE_NEWNS). So for example: > #device filesystem roots > /dev/hda1 ext2/ > /dev/hda2 ext2/var/spool/mail /gaol/var/spool/mail > none proc/proc /gaol/proc Bad format. If anything, it should contain mount IDs (if you want to have union-mount you need those, just to be able to take away components). The following might go: 1 / / ext2/dev/hda1 2 /var/spool/mail / ext2/dev/hda2 3 /proc / procfs 14 /gaol/var/spool/mail/ ext2/dev/hda2 15 /gaol/proc / procfs 42 /gaol/lib/libc.2.1.3.so /lib/libc.2.1.3.so ext2/dev/hda1 ... IOW, ID + mountpoint + location of root in its tree + fs type + fs-specific parameters. That at least allows to reproduce the namespace. And yes, IMO "device" is fs-specific parameter.
Re: FS_SINGLE queries
Alexander Viro writes: > > > On Sat, 10 Jun 2000, Richard Gooch wrote: > > > I see your point. However, that suggests that the naming of > > /proc/mounts is wrong. Perhaps we should have a /proc/namespace that > > shows all these VFS bindings, and separately a list of real mounts. > > What's "real"? /proc/mounts would better left as it was (funny > replacement for /etc/mtab) and there should be something along the > lines of /proc/namespace (hell knows, we might make it compatible > with /proc/ns from new Plan 9). That something most definitely > doesn't need to share the format with /proc/mounts... What I mean by "real" mounts is a table that shows how each FS was brought into the namespace (or each namespace, once you implement CLONE_NEWNS). So for example: #device filesystem roots /dev/hda1 ext2/ /dev/hda2 ext2/var/spool/mail /gaol/var/spool/mail noneproc/proc /gaol/proc in /proc/namespace. And I suppose that /proc/namespace would be unique for each namespace as well. This way, no distinction is made between the first mount and subsequent bindings, which is what you'd like, as I gather you'd like to make all bindings equal. Aside: I guess the reality is that the first binding (the original mount -t ext2) is more equal than the subsequent bindings (mount -t bind). Evidence: the O_CREAT bug I found the other day ;-) Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: FS_SINGLE queries
On Sat, 10 Jun 2000, Richard Gooch wrote: > I see your point. However, that suggests that the naming of > /proc/mounts is wrong. Perhaps we should have a /proc/namespace that > shows all these VFS bindings, and separately a list of real mounts. What's "real"? /proc/mounts would better left as it was (funny replacement for /etc/mtab) and there should be something along the lines of /proc/namespace (hell knows, we might make it compatible with /proc/ns from new Plan 9). That something most definitely doesn't need to share the format with /proc/mounts...
Re: FS_SINGLE queries
In article [EMAIL PROTECTED]> you wrote: > On Sat, 10 Jun 2000, Richard Gooch wrote: > >> Hi, all. I've just been looking at the FS_SINGLE implementation, and >> have a few comments: >> >> - although not documented, you need to do kern_mount() before trying > Yup. >> normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount() >> should be called automatically in >> register_filesystem()/unregister_filesystem()? > > I don't think so. They are different operations and I'm not too happy > about mixing them together. Matter of taste, but... > >> - I note that procfs and pipefs call unregister_filesystem() before >> calling kern_umount(). This looks counter-intuitive, even if it's >> correct (is it?) > > It is. Look: first you take it out of reach so that nobody would mount us > while we are doing kern_umount(), then you kill the tree. This is more of an argument to combine the two operations, imho. Is there any point in having a FS_SINGLE filesystem registered, but not kern_mounted? Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt.
Re: FS_SINGLE queries
Alexander Viro writes: > > > On Sat, 10 Jun 2000, Richard Gooch wrote: > > > Hi, all. I've just been looking at the FS_SINGLE implementation, and > > have a few comments: > > > > - although not documented, you need to do kern_mount() before trying > Yup. > > normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount() > > should be called automatically in > > register_filesystem()/unregister_filesystem()? > > I don't think so. They are different operations and I'm not too happy > about mixing them together. Matter of taste, but... Yeah, I know. Having it documented would satisfy me. Getting a kernel BUG after adding FS_SINGLE was a shock: "what the %@$& ?!?". > > - I note that procfs and pipefs call unregister_filesystem() before > > calling kern_umount(). This looks counter-intuitive, even if it's > > correct (is it?) > > It is. Look: first you take it out of reach so that nobody would > mount us while we are doing kern_umount(), then you kill the tree. I suspected it was about race prevention. Again, if it was documented, it would be fine. When I first saw the procfs/pipefs code, I was left wondering if it was safe to unregister before unmounting. > > - when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS > > type rather than "bind", which also seems wrong. > > Why? Bind is _not_ a filesystem type. From the kernel point of view > old and new instances after binding are identical - there is no > asymmetry. I see your point. However, that suggests that the naming of /proc/mounts is wrong. Perhaps we should have a /proc/namespace that shows all these VFS bindings, and separately a list of real mounts. I have the feeling we're mixing two different pieces of information into /proc/mounts at the moment. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: FS_SINGLE queries
On Sat, 10 Jun 2000, Richard Gooch wrote: > Hi, all. I've just been looking at the FS_SINGLE implementation, and > have a few comments: > > - although not documented, you need to do kern_mount() before trying Yup. > normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount() > should be called automatically in > register_filesystem()/unregister_filesystem()? I don't think so. They are different operations and I'm not too happy about mixing them together. Matter of taste, but... > - I note that procfs and pipefs call unregister_filesystem() before > calling kern_umount(). This looks counter-intuitive, even if it's > correct (is it?) It is. Look: first you take it out of reach so that nobody would mount us while we are doing kern_umount(), then you kill the tree. > - when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS > type rather than "bind", which also seems wrong. Why? Bind is _not_ a filesystem type. From the kernel point of view old and new instances after binding are identical - there is no asymmetry.
FS_SINGLE queries
Hi, all. I've just been looking at the FS_SINGLE implementation, and have a few comments: - although not documented, you need to do kern_mount() before trying normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount() should be called automatically in register_filesystem()/unregister_filesystem()? - I note that procfs and pipefs call unregister_filesystem() before calling kern_umount(). This looks counter-intuitive, even if it's correct (is it?) - when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS type rather than "bind", which also seems wrong. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: [RFC] union-mount stuff
Neil Brown wrote: > I tried to come up with a model which is a generalisation of the "old" > behaviour, and provides agreeable semantics for new behaviours. > This is what I came up with: > > A "mount" is an ordered list (pile) of directories. > One of these elements is the "mountpoint", and it is particularly > distiguished because ".." from the "mount" goes through ".." of the > "mountpoint".".." of all other directories is not accessable. > > Each directory in the pile has two flags (well, three if you count > IS_MOUNTPOINT). > > IS_WRITABLE: You can create things in here. > IS_VISIBLE: You can see inside this. > > Thus, a traditional mount has two directories in the pile. > The bottom one IS_MOUNTPOINT > The top one IS_WRITABLE|IS_VISIBLE oh, aren't we in danger of reducing McVoy's semantic gap between theory and practice here? This model is simple enuff to do right the first time, rather than propagating broken semantics which we fix later :-) i *like* it. a simple (3 bits) generalisation, which leaves trad-mount, union-mount (whichever flavour you'd like it to be) and the null-fs as corner cases. this would provide a nice toolset for all sorts of interesting filesys widgets, some of which (autofs, devfs) can be simplified when they take advantage of this infrastructure. What can we do with this? The interesting case is of course creation, and the uppermost-IS_WRITEABLE case can handle all possibilities, with a few tricks: - for laying *additions* or *updates* over a readonly base filesystem, it just works, with no additional policy modules required. - for laying additions/updates/*deletions* over a base filesystem, we also need a -ve entry for objects which have been removed, and a method of ensuring their lookups fail (persistent absence?). - for "interesting" mappings like autofs or devfs, which make policy decisions on lookup+creation, but create *onto* underlying namespaces, union mount can keep the policy and persistence layers separate, with a dumb, persistent base filesys overlaid by a policy-only filesys which is almost stateless (has dcache state, but no persistent storage) so how do we handle revoked entries in a union-mount of rw over ro? - nail down a -ve dcache entry (yuk, no longer pruneable, no longer a cache, ok for autofs' handful of -ve entries, but not scaleable) - magic entries in a mounted-with-IS_MAGIC layer, which have some definitive i-am-not-here property which aborts the lookup, and evades readdir? maybe symlinks to "__ENOENT__" return ENOENT under IS_MAGIC mountflag? - mounting "nullfs" over revoked nodes (is mounting over non-directories actually broken or just deprecated? is it too expensive?) - union-mount *another* filesys with a new flag IS_NEGATIVE, which contains a zero-length file for each file-or-directory which is to return ENOENT. This is more of an "intersection mount" than a union mount, and avoids the namespace-pollution aspect of IS_MAGIC. This little negative-fs would be best implemented as a loopback mount. if you want some more complex policy, just mount a policy-only filesys on top of the pile, which advertises itself as IS_WRITEABLE, but actually arranges creation in some dynamically-determined way, creating entries in whichever lower filesys its policy dictates. Autofs and devfs work this way, and this could provide a generic toolset for such vfs magic. I feel most of the feathers flying in the devfs wars were (are still) about the duplication of vfs functionality, where generalisation of available mechanisms could give it better tools, and result in a much smaller footprint. I'm not arguing for *complexifying* Al's work, or Neil's model, just playing with it as a toolset to see what's missing. Conclusion: Nothing. All more complex interactions are better done using separate policy-only mini-fs layers on top of this (addressing removal from namespace, selection from multiple matches, regexp entries for matching name classes, etc), and many can be exported into userspace via autofs maps. Attempting to clutter this union-fs model with extra bits for use by one or more policy-modules will probably make it sub-optimal for use as a tool by others. ^..^ (00)