Re: [some sanity for a change] possible design issues for hybrids
On Thu, 2004-08-26 at 23:04, Linus Torvalds wrote: [ This is quite possibly just impossible and buggy, but here's my implementation notes. You asked for them. ] On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote: All right, let's see where that would take us. 3) what do we do on umount(2)? We can get a bunch of vfsmounts hanging off it. MNT_DETACH will have no problems, but normal umount() is a different story. Note that it's not just hybrid-related problem - implementing the mount traps will cause the same kind of trouble, Don't allow umount. It's not something the user can unmount - the mount is implied in the file. 4) OK, we have those hybrids and want to create vfsmounts when crossing a mountpoint. When do they go away, anyway? When we don't reference them anymore? Right now attached to mount tree == +1 to refcount and detaching happens explicitly - outside of the dropping the final reference path. Might become a locking issue. Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously clossed something over when I blathered about the create the vfsmount on the fly thing above ;) If it is created on the fly, it should be easy to destroy on the fly using time-based expiry, i.e. a kernel daemon going over all of those beasts every X seconds (X = 5 perhaps?) and doing something like: for (each vfsmount) { lock_vfsmount(vfsmount); if (MOUNT_IS_BUSY(vfsmount)) { unlock_vfsmount(vfsmount); continue; } if (current_time() (vfsmount-last_used_time + vfsmount-expire_after)) { unlock_vfsmount(vfsmount); continue; } destroy_locked_vfsmount(vfsmount); } Wouldn't that work? 10) how do we deal with directories, anyway? Mixing attributes with normal directory contents is going to be fun, what with lseek() insanity. You couldn't get at the attributes that way anyway, so I think the point is moot. The real directory always takes over. Crazy people could try to just use the regular xattrs interfaces if they really want attributes on directories. You wouldn't ever be able to use the easy one. But that defeats the whole point of the hybrid objects! We might as well just keep the xattrs interface and throw away the new one if we will have to keep the old one anyway so that we can do named streams/attributes on directories. Windows (and other OS?) certainly do allow them and do use them on directories as well as files... 11) if we go for your here's stuff that belongs in device node viewed as directory, how would that play with fs metadata exporters? Again, due to the insanity of lseek() on directories it's *very* hard to deal with unions, when parts of directory come from different chunks of code. Don't go there. See above. Directories would be just plain directories, you could never see their metadata. Same goes for at least symlinks, and possibly other filetypes too (ie at least initially, a block or character special device will just take over the whole file_operations, which includes readdir, so it's actually hard to have the filesystem do anything about those). Ah, but at least in other OS (Windows is the one I know about) directories _also_ have the same semantics as files with respect to named streams/attributes. Best regards, Anton -- Anton Altaparmakov aia21 at cam.ac.uk (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/
Re: [some sanity for a change] possible design issues for hybrids
On Fri, Aug 27, 2004 at 09:56:39AM +0100, Anton Altaparmakov wrote: If it is created on the fly, it should be easy to destroy on the fly using time-based expiry, i.e. a kernel daemon going over all of those beasts every X seconds (X = 5 perhaps?) and doing something like: for (each vfsmount) { lock_vfsmount(vfsmount); if (MOUNT_IS_BUSY(vfsmount)) { unlock_vfsmount(vfsmount); continue; } if (current_time() (vfsmount-last_used_time + vfsmount-expire_after)) { unlock_vfsmount(vfsmount); continue; } destroy_locked_vfsmount(vfsmount); } Wouldn't that work? That would work for a low number of them. But with Hans' visions we'd have a damn lot of them at which point this isn't really scalable.
Re: [some sanity for a change] possible design issues for hybrids
[ This is quite possibly just impossible and buggy, but here's my implementation notes. You asked for them. ] On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote: All right, let's see where that would take us. 1) we would need to find all vfsmounts over given dentry. Probably a cyclic list (we want to check if there are normal mounts/bindings among those and we want to dissolve them if there's none). Not per-inode? dentries are a bitmore memory-constrained than inodes, and we only need this for filesystems that want to support it, so we wouldn't need to put this information in each dentry. Since the vfsmounts have back-pointers to the dentry they are mounted on, you can still do a per-dentry traversal, by just doign the inode list and checking the dentry pointer. No? Alternatively, we could just put the list on an external hash-chain entirely, and hash off the dentry. It depends on how often we end up needing it. Or we could just put it in the dentry itself. I'd hate to make it any bigger than it already is, but maybe it doesn't matter that much. 2) we would need to do something about locking, since mount trees in other guys' namespaces are protected by semaphores of their own. Ok, I'll admit that I don't know how to handle namespaces. These things should just go into a global namespace, and I was kind of assuming it would happen automatically in lookup_mnt() or something like that. A special case in lookup_mnt which says something like if you didn't find a vfsmount, we create a new one for you. It should be reasonably easy to create new ones on-the-fly, since we'd have all the information (the parent vfsmount comes stated, and the vfsmount we create would point to the same things that the base one would). 3) what do we do on umount(2)? We can get a bunch of vfsmounts hanging off it. MNT_DETACH will have no problems, but normal umount() is a different story. Note that it's not just hybrid-related problem - implementing the mount traps will cause the same kind of trouble, Don't allow umount. It's not something the user can unmount - the mount is implied in the file. 4) OK, we have those hybrids and want to create vfsmounts when crossing a mountpoint. When do they go away, anyway? When we don't reference them anymore? Right now attached to mount tree == +1 to refcount and detaching happens explicitly - outside of the dropping the final reference path. Might become a locking issue. Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously clossed something over when I blathered about the create the vfsmount on the fly thing above ;) 5) Creation of these vfsmounts: fs should somehow tell us whether it wants one or not (at the very least, we should stop *somewhere*). Can we use the same dentry/inode? I'm not sure and I really doubt that we'd like that. Why not? When doing the -lookup() operation, the filesystem would create the vfsmount and bind it to the current vfsmount. That guarantees that it has a vfsmount, and will mean that it will show up positive with the d_mountpoint() query, which in turn will cause us to do the lookup_mnt(). Which in turn will create the other vfsmounts as needed, if you have multiple namespaces. So I _think_ creation is easy. Getting rid of the dang things migth be harder. 6) if it's a method, where should it live, *especially* if we want them on device nodes. Note that inode_operations belongs to underlying fs, so it's not particulary good place for device case. Why not just let the existing .lookup method initialize the mount-point thing? After that, it's all in the VFS layer (I'd hate to have filesystems mess around with vfsmounts - they'll just get it wrong). 7) automount folks want partially shared mount trees (well, mirrored, actually). I don't think you can get partial sharing on one of these puppies. You'd always have one vfsmount per namespace (well, lazily created, so maybe in practice you'd see a lot fewer). 8) what should happen when something is mounted on top of directory-over-file? How do we treat such beasts? What are the implications? Allow file-on-file mounts - it will just totally hide the thing (in that namespace, at least). But don't allow the dir-on-file thing (that we already don't allow). 9) how do we recognize such mountpoints in the path lookups? It *is* a hot path, so we should be careful in that area; the impact will be felt by everything in the system. I don't think you'll have any special cases. Same d_mountpount(), same lookup_mnt(). 10) how do we deal with directories, anyway? Mixing attributes with normal directory contents is going to be fun, what with lseek() insanity. You couldn't get at the attributes that way anyway, so I think the point is moot. The real directory always takes over. Crazy people could try to just use the regular xattrs interfaces if they really want attributes on directories. You wouldn't ever be able to use the easy
Re: [some sanity for a change] possible design issues for hybrids
On Thu, Aug 26, 2004 at 03:04:21PM -0700, Linus Torvalds wrote: 2) we would need to do something about locking, since mount trees in other guys' namespaces are protected by semaphores of their own. Ok, I'll admit that I don't know how to handle namespaces. These things should just go into a global namespace, and I was kind of assuming it would happen automatically in lookup_mnt() or something like that. A special case in lookup_mnt which says something like if you didn't find a vfsmount, we create a new one for you. It should be reasonably easy to create new ones on-the-fly, since we'd have all the information (the parent vfsmount comes stated, and the vfsmount we create would point to the same things that the base one would). Erm... What do we do upon unlink()? I'm killing a file, fs it's in is mounted in a dozen of places (no namespaces, just chroot jails, whatever). We need to find all vfsmounts to be killed by that. And BTW that's an argument against anchoring that list in inode - unlink() on foo should not screw bar/... even if bar and foo are links to the same file. So we'll need to check for dentry match anyway. 3) what do we do on umount(2)? We can get a bunch of vfsmounts hanging off it. MNT_DETACH will have no problems, but normal umount() is a different story. Note that it's not just hybrid-related problem - implementing the mount traps will cause the same kind of trouble, Don't allow umount. It's not something the user can unmount - the mount is implied in the file. See below. 4) OK, we have those hybrids and want to create vfsmounts when crossing a mountpoint. When do they go away, anyway? When we don't reference them anymore? Right now attached to mount tree == +1 to refcount and detaching happens explicitly - outside of the dropping the final reference path. Might become a locking issue. Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously clossed something over when I blathered about the create the vfsmount on the fly thing above ;) 5) Creation of these vfsmounts: fs should somehow tell us whether it wants one or not (at the very least, we should stop *somewhere*). Can we use the same dentry/inode? I'm not sure and I really doubt that we'd like that. Why not? When doing the -lookup() operation, the filesystem would create the vfsmount and bind it to the current vfsmount. That guarantees that it has a vfsmount, and will mean that it will show up positive with the d_mountpoint() query, which in turn will cause us to do the lookup_mnt(). Several paragraphs below you are saying that you don't like fs messing with vfsmounts. Use of -lookup() would mean that we should not only create and attach vfsmounts from within fs code, but would actually have to make -lookup() return vfsmount+dentry, AFAICS. 6) if it's a method, where should it live, *especially* if we want them on device nodes. Note that inode_operations belongs to underlying fs, so it's not particulary good place for device case. Why not just let the existing .lookup method initialize the mount-point thing? After that, it's all in the VFS layer (I'd hate to have filesystems mess around with vfsmounts - they'll just get it wrong). Allow file-on-file mounts - it will just totally hide the thing (in that namespace, at least). But don't allow the dir-on-file thing (that we already don't allow). Err... What about dir-on-dir-that-is-on-file? I.e. mount on foo/. when foo is a file? 9) how do we recognize such mountpoints in the path lookups? It *is* a hot path, so we should be careful in that area; the impact will be felt by everything in the system. I don't think you'll have any special cases. Same d_mountpount(), same lookup_mnt(). See above on use -lookup()
Re: [some sanity for a change] possible design issues for hybrids
On Thu, 26 Aug 2004 [EMAIL PROTECTED] wrote: It should be reasonably easy to create new ones on-the-fly, since we'd have all the information (the parent vfsmount comes stated, and the vfsmount we create would point to the same things that the base one would). Erm... What do we do upon unlink()? I'm killing a file, fs it's in is mounted in a dozen of places (no namespaces, just chroot jails, whatever). We need to find all vfsmounts to be killed by that. But that should be trivial: that's what the per-inode vfsmount list was (your first question in the last email). And BTW that's an argument against anchoring that list in inode - unlink() on foo should not screw bar/... even if bar and foo are links to the same file. So we'll need to check for dentry match anyway. And again - I talked about this in the previous email. Even if you anchor the list in struct inode, or you do it with a totally external hash-list, you'll always have the vfsmount-mnt_mountpoint pointer to point to the dentry. So you can just iterate over the list, and cherry-pick the ones that point to the dentry you are removing. 3) what do we do on umount(2)? We can get a bunch of vfsmounts hanging off it. MNT_DETACH will have no problems, but normal umount() is a different story. Note that it's not just hybrid-related problem - implementing the mount traps will cause the same kind of trouble, Don't allow umount. It's not something the user can unmount - the mount is implied in the file. See below. 4) OK, we have those hybrids and want to create vfsmounts when crossing a mountpoint. When do they go away, anyway? When we don't reference them anymore? Right now attached to mount tree == +1 to refcount and detaching happens explicitly - outside of the dropping the final reference path. Might become a locking issue. Ahh. Umm.. Yes. I think this might be the real problem. Unless I seriously clossed something over when I blathered about the create the vfsmount on the fly thing above ;) 5) Creation of these vfsmounts: fs should somehow tell us whether it wants one or not (at the very least, we should stop *somewhere*). Can we use the same dentry/inode? I'm not sure and I really doubt that we'd like that. Why not? When doing the -lookup() operation, the filesystem would create the vfsmount and bind it to the current vfsmount. That guarantees that it has a vfsmount, and will mean that it will show up positive with the d_mountpoint() query, which in turn will cause us to do the lookup_mnt(). Several paragraphs below you are saying that you don't like fs messing with vfsmounts. Use of -lookup() would mean that we should not only create and attach vfsmounts from within fs code, but would actually have to make -lookup() return vfsmount+dentry, AFAICS. No, lookup would just return the dentry, but the dentry would already be filled in with the mount-point information. And you can do that with a simple vfs helper function, ie the filesystem itself would just need to do pseudo_mount(dentry, inode); thing - which just fills in dentry-d_mountpoint with a new vfsmount thing. It would allocate a new root dentry (for the pseudo-mount) and a new vfsmount, and make dentry-d_mountpoint point to it. IOW, the filesystem itself would never mess around with d_mountpoint itself. Err... What about dir-on-dir-that-is-on-file? I.e. mount on foo/. when foo is a file? Hmm.. We might as well allow it, I suspect. It's not like it should hurt. We'd end up following the mount-chain twice, but we already have that issue with multi-mount cases.. Linus
Re: [some sanity for a change] possible design issues for hybrids
On Thu, Aug 26, 2004 at 03:45:09PM -0700, Linus Torvalds wrote: No, lookup would just return the dentry, but the dentry would already be filled in with the mount-point information. And you can do that with a simple vfs helper function, ie the filesystem itself would just need to do pseudo_mount(dentry, inode); thing - which just fills in dentry-d_mountpoint with a new vfsmount thing. It would allocate a new root dentry (for the pseudo-mount) and a new vfsmount, and make dentry-d_mountpoint point to it. What dentry-d_mountpoint? No such thing... Note that we can't get vfsmount by dentry - that's the point of having these guys in the first place. So I'm not sure what you are trying to do here - dentry + inode is definitely not enough to attach any vfsmounts anywhere. That's not about namespaces - same fs mounted in several places will give the same problem - one dentry, many vfsmounts. And we obviously *can't* have one vfsmount for all of them - if the same fs is mounted on /foo and /bar, we will have the same dentry for /foo/splat and /bar/splat. So what should we get for /foo/splat/. and /bar/splat/.? Same dentry *and* same vfsmount? I'd expect .. from the former to give /foo and from the latter - /bar...