Re: [some sanity for a change] possible design issues for hybrids

2004-08-27 Thread Anton Altaparmakov
On Thu, 2004-08-26 at 23:04, Linus Torvalds wrote:
 [ This is quite possibly just impossible and buggy, but here's my
   implementation notes. You asked for them. ]
 On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote:
  All right, let's see where that would take us.
  3) what do we do on umount(2)?  We can get a bunch of vfsmounts hanging off
  it.  MNT_DETACH will have no problems, but normal umount() is a different
  story.  Note that it's not just hybrid-related problem - implementing the
  mount traps will cause the same kind of trouble,
 
 Don't allow umount. It's not something the user can unmount - the mount is 
 implied in the file. 
 
  4) OK, we have those hybrids and want to create vfsmounts when crossing a
  mountpoint.  When do they go away, anyway?  When we don't reference them
  anymore?  Right now attached to mount tree == +1 to refcount and detaching
  happens explicitly - outside of the dropping the final reference path.
  Might become a locking issue.
 
 Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously 
 clossed something over when I blathered about the create the vfsmount on 
 the fly thing above ;)

If it is created on the fly, it should be easy to destroy on the fly
using time-based expiry, i.e. a kernel daemon going over all of those
beasts every X seconds (X = 5 perhaps?) and doing something like:

for (each vfsmount) {
lock_vfsmount(vfsmount);
if (MOUNT_IS_BUSY(vfsmount)) {
unlock_vfsmount(vfsmount);
continue;
}
if (current_time()  (vfsmount-last_used_time +
vfsmount-expire_after)) {
unlock_vfsmount(vfsmount);
continue;
}
destroy_locked_vfsmount(vfsmount);
}

Wouldn't that work?

  10) how do we deal with directories, anyway?  Mixing attributes with
  normal directory contents is going to be fun, what with lseek() insanity.
 
 You couldn't get at the attributes that way anyway, so I think the point 
 is moot. The real directory always takes over.
 
 Crazy people could try to just use the regular xattrs interfaces if they 
 really want attributes on directories. You wouldn't ever be able to use 
 the easy one.

But that defeats the whole point of the hybrid objects!  We might as
well just keep the xattrs interface and throw away the new one if we
will have to keep the old one anyway so that we can do named
streams/attributes on directories.  Windows (and other OS?) certainly do
allow them and do use them on directories as well as files...

  11) if we go for your here's stuff that belongs in device node viewed
  as directory, how would that play with fs metadata exporters?  Again,
  due to the insanity of lseek() on directories it's *very* hard to deal
  with unions, when parts of directory come from different chunks of code.
 
 Don't go there. See above. Directories would be just plain directories, 
 you could never see their metadata. Same goes for at least symlinks, and 
 possibly other filetypes too (ie at least initially, a block or character 
 special device will just take over the whole file_operations, which 
 includes readdir, so it's actually hard to have the filesystem do 
 anything about those).

Ah, but at least in other OS (Windows is the one I know about)
directories _also_ have the same semantics as files with respect to
named streams/attributes.

Best regards,

Anton
-- 
Anton Altaparmakov aia21 at cam.ac.uk (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/



Re: [some sanity for a change] possible design issues for hybrids

2004-08-27 Thread Christoph Hellwig
On Fri, Aug 27, 2004 at 09:56:39AM +0100, Anton Altaparmakov wrote:
 If it is created on the fly, it should be easy to destroy on the fly
 using time-based expiry, i.e. a kernel daemon going over all of those
 beasts every X seconds (X = 5 perhaps?) and doing something like:
 
 for (each vfsmount) {
   lock_vfsmount(vfsmount);
   if (MOUNT_IS_BUSY(vfsmount)) {
   unlock_vfsmount(vfsmount);
   continue;
   }
   if (current_time()  (vfsmount-last_used_time +
   vfsmount-expire_after)) {
   unlock_vfsmount(vfsmount);
   continue;
   }
   destroy_locked_vfsmount(vfsmount);
 }
 
 Wouldn't that work?

That would work for a low number of them.  But with Hans' visions we'd
have a damn lot of them at which point this isn't really scalable.



Re: [some sanity for a change] possible design issues for hybrids

2004-08-26 Thread Linus Torvalds

[ This is quite possibly just impossible and buggy, but here's my
  implementation notes. You asked for them. ]

On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote:
 
 All right, let's see where that would take us.
 
 1) we would need to find all vfsmounts over given dentry.  Probably a cyclic
 list (we want to check if there are normal mounts/bindings among those and
 we want to dissolve them if there's none).

Not per-inode? dentries are a bitmore memory-constrained than inodes, and 
we only need this for filesystems that want to support it, so we wouldn't 
need to put this information in each dentry.

Since the vfsmounts have back-pointers to the dentry they are mounted on,
you can still do a per-dentry traversal, by just doign the inode list
and checking the dentry pointer. No?

Alternatively, we could just put the list on an external hash-chain 
entirely, and hash off the dentry. It depends on how often we end up 
needing it.

Or we could just put it in the dentry itself. I'd hate to make it any 
bigger than it already is, but maybe it doesn't matter that much.

 2) we would need to do something about locking, since mount trees in other
 guys' namespaces are protected by semaphores of their own.

Ok, I'll admit that I don't know how to handle namespaces. These things 
should just go into a global namespace, and I was kind of assuming it 
would happen automatically in lookup_mnt() or something like that. A 
special case in lookup_mnt which says something like if you didn't find a 
vfsmount, we create a new one for you.

It should be reasonably easy to create new ones on-the-fly, since we'd
have all the information (the parent vfsmount comes stated, and the
vfsmount we create would point to the same things that the base one
would).

 3) what do we do on umount(2)?  We can get a bunch of vfsmounts hanging off
 it.  MNT_DETACH will have no problems, but normal umount() is a different
 story.  Note that it's not just hybrid-related problem - implementing the
 mount traps will cause the same kind of trouble,

Don't allow umount. It's not something the user can unmount - the mount is 
implied in the file. 

 4) OK, we have those hybrids and want to create vfsmounts when crossing a
 mountpoint.  When do they go away, anyway?  When we don't reference them
 anymore?  Right now attached to mount tree == +1 to refcount and detaching
 happens explicitly - outside of the dropping the final reference path.
 Might become a locking issue.

Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously 
clossed something over when I blathered about the create the vfsmount on 
the fly thing above ;)

 5) Creation of these vfsmounts: fs should somehow tell us whether it wants
 one or not (at the very least, we should stop *somewhere*).  Can we use
 the same dentry/inode?  I'm not sure and I really doubt that we'd like that.

Why not? When doing the -lookup() operation, the filesystem would create
the vfsmount and bind it to the current vfsmount. That guarantees that it
has a vfsmount, and will mean that it will show up positive with the
d_mountpoint() query, which in turn will cause us to do the
lookup_mnt().

Which in turn will create the other vfsmounts as needed, if you have 
multiple namespaces.

So I _think_ creation is easy. Getting rid of the dang things migth be 
harder.

 6) if it's a method, where should it live, *especially* if we want them on
 device nodes.  Note that inode_operations belongs to underlying fs, so it's
 not particulary good place for device case.

Why not just let the existing .lookup method initialize the mount-point 
thing? After that, it's all in the VFS layer (I'd hate to have filesystems 
mess around with vfsmounts - they'll just get it wrong).

 7) automount folks want partially shared mount trees (well, mirrored,
 actually).

I don't think you can get partial sharing on one of these puppies. You'd 
always have one vfsmount per namespace (well, lazily created, so maybe in 
practice you'd see a lot fewer).

 8) what should happen when something is mounted on top of directory-over-file?
 How do we treat such beasts?  What are the implications?

Allow file-on-file mounts - it will just totally hide the thing (in that
namespace, at least). But don't allow the dir-on-file thing (that we
already don't allow).

 9) how do we recognize such mountpoints in the path lookups?  It *is* a
 hot path, so we should be careful in that area; the impact will be felt
 by everything in the system.

I don't think you'll have any special cases. Same d_mountpount(), same 
lookup_mnt().

 10) how do we deal with directories, anyway?  Mixing attributes with
 normal directory contents is going to be fun, what with lseek() insanity.

You couldn't get at the attributes that way anyway, so I think the point 
is moot. The real directory always takes over.

Crazy people could try to just use the regular xattrs interfaces if they 
really want attributes on directories. You wouldn't ever be able to use 
the easy 

Re: [some sanity for a change] possible design issues for hybrids

2004-08-26 Thread viro
On Thu, Aug 26, 2004 at 03:04:21PM -0700, Linus Torvalds wrote:
 
  2) we would need to do something about locking, since mount trees in other
  guys' namespaces are protected by semaphores of their own.
 
 Ok, I'll admit that I don't know how to handle namespaces. These things 
 should just go into a global namespace, and I was kind of assuming it 
 would happen automatically in lookup_mnt() or something like that. A 
 special case in lookup_mnt which says something like if you didn't find a 
 vfsmount, we create a new one for you.
 
 It should be reasonably easy to create new ones on-the-fly, since we'd
 have all the information (the parent vfsmount comes stated, and the
 vfsmount we create would point to the same things that the base one
 would).

Erm...  What do we do upon unlink()?  I'm killing a file, fs it's in is
mounted in a dozen of places (no namespaces, just chroot jails, whatever).
We need to find all vfsmounts to be killed by that.

And BTW that's an argument against anchoring that list in inode - unlink()
on foo should not screw bar/... even if bar and foo are links to the same
file.  So we'll need to check for dentry match anyway.
 
  3) what do we do on umount(2)?  We can get a bunch of vfsmounts hanging off
  it.  MNT_DETACH will have no problems, but normal umount() is a different
  story.  Note that it's not just hybrid-related problem - implementing the
  mount traps will cause the same kind of trouble,
 
 Don't allow umount. It's not something the user can unmount - the mount is 
 implied in the file. 

See below.

  4) OK, we have those hybrids and want to create vfsmounts when crossing a
  mountpoint.  When do they go away, anyway?  When we don't reference them
  anymore?  Right now attached to mount tree == +1 to refcount and detaching
  happens explicitly - outside of the dropping the final reference path.
  Might become a locking issue.
 
 Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously 
 clossed something over when I blathered about the create the vfsmount on 
 the fly thing above ;)

  5) Creation of these vfsmounts: fs should somehow tell us whether it wants
  one or not (at the very least, we should stop *somewhere*).  Can we use
  the same dentry/inode?  I'm not sure and I really doubt that we'd like that.
 
 Why not? When doing the -lookup() operation, the filesystem would create
 the vfsmount and bind it to the current vfsmount. That guarantees that it
 has a vfsmount, and will mean that it will show up positive with the
 d_mountpoint() query, which in turn will cause us to do the
 lookup_mnt().

Several paragraphs below you are saying that you don't like fs messing with
vfsmounts.  Use of -lookup() would mean that we should not only create
and attach vfsmounts from within fs code, but would actually have to make
-lookup() return vfsmount+dentry, AFAICS.
 
  6) if it's a method, where should it live, *especially* if we want them on
  device nodes.  Note that inode_operations belongs to underlying fs, so it's
  not particulary good place for device case.
 
 Why not just let the existing .lookup method initialize the mount-point 
 thing? After that, it's all in the VFS layer (I'd hate to have filesystems 
 mess around with vfsmounts - they'll just get it wrong).

 Allow file-on-file mounts - it will just totally hide the thing (in that
 namespace, at least). But don't allow the dir-on-file thing (that we
 already don't allow).

Err...  What about dir-on-dir-that-is-on-file?  I.e. mount on foo/. when foo
is a file?
 
  9) how do we recognize such mountpoints in the path lookups?  It *is* a
  hot path, so we should be careful in that area; the impact will be felt
  by everything in the system.

 I don't think you'll have any special cases. Same d_mountpount(), same 
 lookup_mnt().

See above on use -lookup()


Re: [some sanity for a change] possible design issues for hybrids

2004-08-26 Thread Linus Torvalds


On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote:
  
  It should be reasonably easy to create new ones on-the-fly, since we'd
  have all the information (the parent vfsmount comes stated, and the
  vfsmount we create would point to the same things that the base one
  would).
 
 Erm...  What do we do upon unlink()?  I'm killing a file, fs it's in is
 mounted in a dozen of places (no namespaces, just chroot jails, whatever).
 We need to find all vfsmounts to be killed by that.

But that should be trivial: that's what the per-inode vfsmount list was 
(your first question in the last email).

 And BTW that's an argument against anchoring that list in inode - unlink()
 on foo should not screw bar/... even if bar and foo are links to the same
 file.  So we'll need to check for dentry match anyway.

And again - I talked about this in the previous email. Even if you anchor 
the list in struct inode, or you do it with a totally external 
hash-list, you'll always have the vfsmount-mnt_mountpoint pointer to 
point to the dentry. So you can just iterate over the list, and 
cherry-pick the ones that point to the dentry you are removing.

  
   3) what do we do on umount(2)?  We can get a bunch of vfsmounts hanging off
   it.  MNT_DETACH will have no problems, but normal umount() is a different
   story.  Note that it's not just hybrid-related problem - implementing the
   mount traps will cause the same kind of trouble,
  
  Don't allow umount. It's not something the user can unmount - the mount is 
  implied in the file. 
 
 See below.
 
   4) OK, we have those hybrids and want to create vfsmounts when crossing a
   mountpoint.  When do they go away, anyway?  When we don't reference them
   anymore?  Right now attached to mount tree == +1 to refcount and detaching
   happens explicitly - outside of the dropping the final reference path.
   Might become a locking issue.
  
  Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously 
  clossed something over when I blathered about the create the vfsmount on 
  the fly thing above ;)
 
   5) Creation of these vfsmounts: fs should somehow tell us whether it wants
   one or not (at the very least, we should stop *somewhere*).  Can we use
   the same dentry/inode?  I'm not sure and I really doubt that we'd like that.
  
  Why not? When doing the -lookup() operation, the filesystem would create
  the vfsmount and bind it to the current vfsmount. That guarantees that it
  has a vfsmount, and will mean that it will show up positive with the
  d_mountpoint() query, which in turn will cause us to do the
  lookup_mnt().
 
 Several paragraphs below you are saying that you don't like fs messing with
 vfsmounts.  Use of -lookup() would mean that we should not only create
 and attach vfsmounts from within fs code, but would actually have to make
 -lookup() return vfsmount+dentry, AFAICS.

No, lookup would just return the dentry, but the dentry would already be 
filled in with the mount-point information.

And you can do that with a simple vfs helper function, ie the filesystem 
itself would just need to do

pseudo_mount(dentry, inode);

thing - which just fills in dentry-d_mountpoint with a new vfsmount
thing. It would allocate a new root dentry (for the pseudo-mount) and a
new vfsmount, and make dentry-d_mountpoint point to it.

IOW, the filesystem itself would never mess around with d_mountpoint 
itself.

 Err...  What about dir-on-dir-that-is-on-file?  I.e. mount on foo/. when foo
 is a file?

Hmm.. We might as well allow it, I suspect. It's not like it should hurt.  
We'd end up following the mount-chain twice, but we already have that
issue with multi-mount cases..

Linus


Re: [some sanity for a change] possible design issues for hybrids

2004-08-26 Thread viro
On Thu, Aug 26, 2004 at 03:45:09PM -0700, Linus Torvalds wrote:
 No, lookup would just return the dentry, but the dentry would already be 
 filled in with the mount-point information.
 
 And you can do that with a simple vfs helper function, ie the filesystem 
 itself would just need to do
 
   pseudo_mount(dentry, inode);
 
 thing - which just fills in dentry-d_mountpoint with a new vfsmount
 thing. It would allocate a new root dentry (for the pseudo-mount) and a
 new vfsmount, and make dentry-d_mountpoint point to it.

What dentry-d_mountpoint?  No such thing...

Note that we can't get vfsmount by dentry - that's the point of having these
guys in the first place.  So I'm not sure what you are trying to do here -
dentry + inode is definitely not enough to attach any vfsmounts anywhere.

That's not about namespaces - same fs mounted in several places will give
the same problem - one dentry, many vfsmounts.  And we obviously *can't*
have one vfsmount for all of them - if the same fs is mounted on /foo and
/bar, we will have the same dentry for /foo/splat and /bar/splat.  So
what should we get for /foo/splat/. and /bar/splat/.?  Same dentry *and*
same vfsmount?  I'd expect .. from the former to give /foo and from the
latter - /bar...