Re: FS_SINGLE queries

2000-06-10 Thread Richard Gooch

Alexander Viro writes:
> On Sat, 10 Jun 2000, Richard Gooch wrote:
> > Will it really make much difference? What would be harder to do
> > without mount IDs? And how much harder?
> 
> Beware of functions with many arguments... Besides, what about "kill
> the component of union-mount on /barf NFS-mounted from
> venus:/foo/bar"?  What exactly are you going to pass here? Such
> stuff is better left to userland.

Let's see. Pass the same stuff you see in /proc/namespace? Instead of
cut-and-paste of the mount ID, cut-and-paste the other entries on that
line. You're making the decision based on what's in /proc/namespace
anyway. Why add another level of indirection?

> > > And then... consider the situation when root logs in and decides to
> > > mess with luser's namespace.
> > 
> > What about it?
> 
> bastard@venus% su -
> Password:
> root@venus% w luser
> 
> luser pts/0   ...
> root@venus% ps t pts/0
> 
> 
> 728 pts/0 ...
> root@venus% cat /proc/728/ns
> 
> 
> 123749/home/luser/foo /   nfs 
> root@venus% umount -I 123749
> root@venus% logout
> bastard@venus% mail luser
> Subject: you've been told to umount ~/foo
> 
> ^D
> 
> > Avoiding numbers is a good thing. They have no intrinsic meaning.
> 
> Tell that to guys who invented file descriptors. IMO that works
> quite fine - I'll prefer to do close(17) rather than  incantations of horror needed on OS/360> type that stuff late on Saturday>

But FDs are different. You reference the file by name, and the system
returns an opaque handle. But the system isn't returning a mount ID as
the result of some operation: you had to scan a file for it. So really
your handle to the mount is something else, the mount ID is merely
another level of indirection.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: FS_SINGLE queries

2000-06-10 Thread Alexander Viro



On Sat, 10 Jun 2000, Richard Gooch wrote:

> Will it really make much difference? What would be harder to do
> without mount IDs? And how much harder?

Beware of functions with many arguments... Besides, what about "kill
the component of union-mount on /barf NFS-mounted from venus:/foo/bar"?
What exactly are you going to pass here? Such stuff is better left to
userland.

> > And then... consider the situation when root logs in and decides to
> > mess with luser's namespace.
> 
> What about it?

bastard@venus% su -
Password:
root@venus% w luser

luser   pts/0   ...
root@venus% ps t pts/0


728 pts/0 ...
root@venus% cat /proc/728/ns


123749  /home/luser/foo /   nfs 
root@venus% umount -I 123749
root@venus% logout
bastard@venus% mail luser
Subject: you've been told to umount ~/foo

^D

> Avoiding numbers is a good thing. They have no intrinsic meaning.

Tell that to guys who invented file descriptors. IMO that works quite
fine - I'll prefer to do close(17) rather than 




Re: FS_SINGLE queries

2000-06-10 Thread Richard Gooch

Alexander Viro writes:
> 
> 
> On Sat, 10 Jun 2000, Richard Gooch wrote:
> 
> > Yeah, sure. I did say "for example". Your format looks fine. One
> > question: is the mount ID really needed? Can't you distinguish based
> > on what FS you're mounting (and mountpoint root)?
> 
> First of all, interface is simpler that way.

Will it really make much difference? What would be harder to do
without mount IDs? And how much harder?

> And then... consider the situation when root logs in and decides to
> mess with luser's namespace.

What about it?

Avoiding numbers is a good thing. They have no intrinsic meaning.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: FS_SINGLE queries

2000-06-10 Thread Alexander Viro



On Sat, 10 Jun 2000, Richard Gooch wrote:

> Yeah, sure. I did say "for example". Your format looks fine. One
> question: is the mount ID really needed? Can't you distinguish based
> on what FS you're mounting (and mountpoint root)?

First of all, interface is simpler that way. And then... consider the
situation when root logs in and decides to mess with luser's namespace.




Re: FS_SINGLE queries

2000-06-10 Thread Richard Gooch

Alexander Viro writes:
> 
> 
> On Sat, 10 Jun 2000, Richard Gooch wrote:
> 
> > What I mean by "real" mounts is a table that shows how each FS was
> > brought into the namespace (or each namespace, once you implement
> > CLONE_NEWNS). So for example:
> > #device filesystem  roots
> > /dev/hda1   ext2/
> > /dev/hda2   ext2/var/spool/mail /gaol/var/spool/mail
> > noneproc/proc /gaol/proc
> 
> Bad format. If anything, it should contain mount IDs (if you want to have
> union-mount you need those, just to be able to take away components).
> The following might go:
> 
> 1 /   /   ext2/dev/hda1
> 2 /var/spool/mail /   ext2/dev/hda2
> 3 /proc   /   procfs
> 14/gaol/var/spool/mail/   ext2/dev/hda2
> 15/gaol/proc  /   procfs
> 42/gaol/lib/libc.2.1.3.so /lib/libc.2.1.3.so  ext2/dev/hda1
> ...
> 
> IOW, ID + mountpoint + location of root in its tree + fs type +
> fs-specific parameters. That at least allows to reproduce the
> namespace. And yes, IMO "device" is fs-specific parameter.

Yeah, sure. I did say "for example". Your format looks fine. One
question: is the mount ID really needed? Can't you distinguish based
on what FS you're mounting (and mountpoint root)?

BTW: I agree that device is fs-specific. It's much nicer to see "none"
done away with.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: FS_SINGLE queries

2000-06-10 Thread Alexander Viro



On Sat, 10 Jun 2000, Richard Gooch wrote:

> What I mean by "real" mounts is a table that shows how each FS was
> brought into the namespace (or each namespace, once you implement
> CLONE_NEWNS). So for example:
> #device   filesystem  roots
> /dev/hda1 ext2/
> /dev/hda2 ext2/var/spool/mail /gaol/var/spool/mail
> none  proc/proc /gaol/proc

Bad format. If anything, it should contain mount IDs (if you want to have
union-mount you need those, just to be able to take away components).
The following might go:

1   /   /   ext2/dev/hda1
2   /var/spool/mail /   ext2/dev/hda2
3   /proc   /   procfs
14  /gaol/var/spool/mail/   ext2/dev/hda2
15  /gaol/proc  /   procfs
42  /gaol/lib/libc.2.1.3.so /lib/libc.2.1.3.so  ext2/dev/hda1
...

IOW, ID + mountpoint + location of root in its tree + fs type + fs-specific
parameters. That at least allows to reproduce the namespace. And yes, IMO
"device" is fs-specific parameter.




Re: FS_SINGLE queries

2000-06-10 Thread Richard Gooch

Alexander Viro writes:
> 
> 
> On Sat, 10 Jun 2000, Richard Gooch wrote:
> 
> > I see your point. However, that suggests that the naming of
> > /proc/mounts is wrong. Perhaps we should have a /proc/namespace that
> > shows all these VFS bindings, and separately a list of real mounts.
> 
> What's "real"? /proc/mounts would better left as it was (funny
> replacement for /etc/mtab) and there should be something along the
> lines of /proc/namespace (hell knows, we might make it compatible
> with /proc/ns from new Plan 9). That something most definitely
> doesn't need to share the format with /proc/mounts...

What I mean by "real" mounts is a table that shows how each FS was
brought into the namespace (or each namespace, once you implement
CLONE_NEWNS). So for example:
#device filesystem  roots
/dev/hda1   ext2/
/dev/hda2   ext2/var/spool/mail /gaol/var/spool/mail
noneproc/proc /gaol/proc

in /proc/namespace. And I suppose that /proc/namespace would be unique
for each namespace as well. This way, no distinction is made between
the first mount and subsequent bindings, which is what you'd like, as
I gather you'd like to make all bindings equal.

Aside: I guess the reality is that the first binding (the original
mount -t ext2) is more equal than the subsequent bindings (mount -t
bind). Evidence: the O_CREAT bug I found the other day ;-)

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: FS_SINGLE queries

2000-06-10 Thread Alexander Viro



On Sat, 10 Jun 2000, Richard Gooch wrote:

> I see your point. However, that suggests that the naming of
> /proc/mounts is wrong. Perhaps we should have a /proc/namespace that
> shows all these VFS bindings, and separately a list of real mounts.

What's "real"? /proc/mounts would better left as it was (funny replacement
for /etc/mtab) and there should be something along the lines of
/proc/namespace (hell knows, we might make it compatible with /proc/ns
from new Plan 9). That something most definitely doesn't need to share the
format with /proc/mounts...




Re: FS_SINGLE queries

2000-06-10 Thread Ion Badulescu

In article [EMAIL PROTECTED]> you wrote:

> On Sat, 10 Jun 2000, Richard Gooch wrote:
> 
>>   Hi, all. I've just been looking at the FS_SINGLE implementation, and
>> have a few comments:
>> 
>> - although not documented, you need to do kern_mount() before trying
>   Yup.
>>   normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount()
>>   should be called automatically in
>>   register_filesystem()/unregister_filesystem()?
> 
> I don't think so. They are different operations and I'm not too happy
> about mixing them together. Matter of taste, but...
> 
>> - I note that procfs and pipefs call unregister_filesystem() before
>>   calling kern_umount(). This looks counter-intuitive, even if it's
>>   correct (is it?)
> 
> It is. Look: first you take it out of reach so that nobody would mount us
> while we are doing kern_umount(), then you kill the tree.

This is more of an argument to combine the two operations, imho. Is there
any point in having a FS_SINGLE filesystem registered, but not kern_mounted?


Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.



Re: FS_SINGLE queries

2000-06-10 Thread Richard Gooch

Alexander Viro writes:
> 
> 
> On Sat, 10 Jun 2000, Richard Gooch wrote:
> 
> >   Hi, all. I've just been looking at the FS_SINGLE implementation, and
> > have a few comments:
> > 
> > - although not documented, you need to do kern_mount() before trying
>   Yup.
> >   normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount()
> >   should be called automatically in
> >   register_filesystem()/unregister_filesystem()?
> 
> I don't think so. They are different operations and I'm not too happy
> about mixing them together. Matter of taste, but...

Yeah, I know. Having it documented would satisfy me. Getting a kernel
BUG after adding FS_SINGLE was a shock: "what the %@$& ?!?".

> > - I note that procfs and pipefs call unregister_filesystem() before
> >   calling kern_umount(). This looks counter-intuitive, even if it's
> >   correct (is it?)
> 
> It is. Look: first you take it out of reach so that nobody would
> mount us while we are doing kern_umount(), then you kill the tree.

I suspected it was about race prevention. Again, if it was documented,
it would be fine. When I first saw the procfs/pipefs code, I was left
wondering if it was safe to unregister before unmounting.

> > - when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS
> >   type rather than "bind", which also seems wrong.
> 
> Why? Bind is _not_ a filesystem type. From the kernel point of view
> old and new instances after binding are identical - there is no
> asymmetry.

I see your point. However, that suggests that the naming of
/proc/mounts is wrong. Perhaps we should have a /proc/namespace that
shows all these VFS bindings, and separately a list of real mounts.

I have the feeling we're mixing two different pieces of information
into /proc/mounts at the moment.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: FS_SINGLE queries

2000-06-10 Thread Alexander Viro



On Sat, 10 Jun 2000, Richard Gooch wrote:

>   Hi, all. I've just been looking at the FS_SINGLE implementation, and
> have a few comments:
> 
> - although not documented, you need to do kern_mount() before trying
Yup.
>   normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount()
>   should be called automatically in
>   register_filesystem()/unregister_filesystem()?

I don't think so. They are different operations and I'm not too happy
about mixing them together. Matter of taste, but...

> - I note that procfs and pipefs call unregister_filesystem() before
>   calling kern_umount(). This looks counter-intuitive, even if it's
>   correct (is it?)

It is. Look: first you take it out of reach so that nobody would mount us
while we are doing kern_umount(), then you kill the tree.

> - when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS
>   type rather than "bind", which also seems wrong.

Why? Bind is _not_ a filesystem type. From the kernel point of view old
and new instances after binding are identical - there is no asymmetry.




FS_SINGLE queries

2000-06-10 Thread Richard Gooch

  Hi, all. I've just been looking at the FS_SINGLE implementation, and
have a few comments:

- although not documented, you need to do kern_mount() before trying
  normal mounts of a FS_SINGLE; perhaps kern_mount()/kern_umount()
  should be called automatically in
  register_filesystem()/unregister_filesystem()?

- I note that procfs and pipefs call unregister_filesystem() before
  calling kern_umount(). This looks counter-intuitive, even if it's
  correct (is it?)

- when mounting a FS which is FS_SINGLE, /proc/mounts reports the FS
  type rather than "bind", which also seems wrong.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]



Re: [RFC] union-mount stuff

2000-06-10 Thread peter swain

Neil Brown wrote:
> I tried to come up with a model which is a generalisation of the "old"
> behaviour, and provides agreeable semantics for new behaviours.
> This is what I came up with:
> 
> A "mount" is an ordered list (pile) of directories.
> One of these elements is the "mountpoint", and it is particularly
> distiguished because ".." from the "mount" goes through ".." of the
> "mountpoint".".." of all other directories is not accessable.
> 
> Each directory in the pile has two flags (well, three if you count
> IS_MOUNTPOINT).
> 
> IS_WRITABLE: You can create things in here.
> IS_VISIBLE: You can see inside this.
> 
> Thus, a traditional mount has two directories in the pile.
> The bottom one IS_MOUNTPOINT
> The top one IS_WRITABLE|IS_VISIBLE

oh, aren't we in danger of reducing McVoy's semantic gap between
theory and practice here?  This model is simple enuff to do right the
first
time, rather than propagating broken semantics which we fix later :-)

i *like* it.
a simple (3 bits) generalisation, which leaves trad-mount, union-mount
(whichever flavour you'd like it to be) and the null-fs as corner cases.

this would provide a nice toolset for all sorts of interesting filesys
widgets, some of which (autofs, devfs) can be simplified when they take
advantage of this infrastructure.

What can we do with this?

The interesting case is of course creation, and the
uppermost-IS_WRITEABLE
case can handle all possibilities, with a few tricks:
- for laying *additions* or *updates* over a readonly base filesystem,
  it just works, with no additional policy modules required.
- for laying additions/updates/*deletions* over a base filesystem, 
  we also need a -ve entry for objects which have been removed,
  and a method of ensuring their lookups fail (persistent absence?).
- for "interesting" mappings like autofs or devfs, which make policy
  decisions on lookup+creation, but create *onto* underlying namespaces,
  union mount can keep the policy and persistence layers separate,
  with a dumb, persistent base filesys overlaid by a policy-only filesys
  which is almost stateless (has dcache state, but no persistent
storage)

so how do we handle revoked entries in a union-mount of rw over ro?
- nail down a -ve dcache entry (yuk, no longer pruneable, no longer a
cache,
  ok for autofs' handful of -ve entries, but not scaleable)
- magic entries in a mounted-with-IS_MAGIC layer, which have some
definitive
  i-am-not-here property which aborts the lookup, and evades readdir?
  maybe symlinks to "__ENOENT__" return ENOENT under IS_MAGIC mountflag?
- mounting "nullfs" over revoked nodes (is mounting over non-directories
  actually broken or just deprecated? is it too expensive?)
- union-mount *another* filesys with a new flag IS_NEGATIVE,
  which contains a zero-length file for each file-or-directory which
  is to return ENOENT.  This is more of an "intersection mount" than 
  a union mount, and avoids the namespace-pollution aspect of IS_MAGIC.
  This little negative-fs would be best implemented as a loopback mount.

if you want some more complex policy,
just mount a policy-only filesys on top of the pile, which advertises
itself
as IS_WRITEABLE, but actually arranges creation in some
dynamically-determined
way, creating entries in whichever lower filesys its policy dictates.

Autofs and devfs work this way, and this could provide a generic toolset
for such vfs magic.  I feel most of the feathers flying in the devfs
wars
were (are still) about the duplication of vfs functionality, where
generalisation of available mechanisms could give it better tools,
and result in a much smaller footprint.

I'm not arguing for *complexifying* Al's work, or Neil's model,
just playing with it as a toolset to see what's missing.
Conclusion: Nothing.

All more complex interactions are better done using separate
policy-only mini-fs layers on top of this (addressing removal
from namespace, selection from multiple matches, regexp entries
for matching name classes, etc), and many can be exported into
userspace via autofs maps.
Attempting to clutter this union-fs model with extra bits for
use by one or more policy-modules will probably make it 
sub-optimal for use as a tool by others.

^..^
(00)