Re: fs changes in 2.3
On Tue, 2 May 2000, Chris Mason wrote: > So the reiserfs team has to fix a problem NFS only has when interacting > with reiserfs. Yes, it would have been nice if someone else had the > chance to do it for us, but to expect them to, and to be mad when > they don't, is more than a little silly. Its time to move on Hans, we > have more important things to worry about. Sigh... Folks, if the knfsd problem is the only one left - you have my vote for temporary inclusion of the *@!#^* ->read_inode2(). AFAICS it's the least of anyone's problems. It can be fixed, fix is not going to be reiserfs-dependent and is unlikely to require additional code in reiserfs. IWBNI Hans would understand the code in fs/reiserfs/* and existing problems with it, but frankly, I don't see how it really matters. As for the "cooperation" bit - huh? What, Hans can contribute something to discussing the technical side of things? Then he was successfully hiding that fact for years. I definitely have no problems with the rest of your team and my problems with Hans... let's keep them separate, OK? I'll start getting worry about that when I'll see a single technical posting from Hans that would not consist of forwarding most of the questions to other people.
announcing stackable file system templates and code generator
It is my pleasure to announce fistgen-0.0.1, the first release of the FiST code generator, used to create stackable file systems out of templates and a high-level language. This package comes with stackable file system templates for Linux, Solaris, and FreeBSD. It also contains several sample file systems built using the FiST language: an encryption file system, a compression file system, and more --- all of which are written as portable stackable file systems. Linux 2.3 folks: my stackable templates now support Size Changing Algorithms (SCAs) such as compression, uuencoding, etc. See specific papers and sample file systems for more details. For more information, software, and papers, see the FiST home page: http://www.cs.columbia.edu/~ezk/research/fist/ Happy stacking. Erez Zadok. --- Columbia University Department of Computer Science. EMail: [EMAIL PROTECTED] Web: http://www.cs.columbia.edu/~ezk
Re: fs changes in 2.3
On Wed, May 03, 2000 at 08:54:54AM +0200, Alexander Viro wrote: [flame snipped -- hopefully everybody can go back to normal work now] > > ObNFS: weird as it may sound to you, I actually write stuff - not > "subcontract" to somebody else. So I'm afraid that I have slightly less > free time than you do. FWIC, in Reiserfs context nfsd is a non-issue. > Current kludge is not too lovely, but it's well-isolated and can be > replaced fast. So ->read_inode2() is ugly, but in my opinion it's not an > obstacle. If other problems will be resolved and by that time > ->fh_to_dentry() interface will not be in place - count on my vote for > temporary adding ->read_inode2(). In the long run some generic support for 64bit inodes will be needed anyways -- other file systems depend on that too (e.g. XFS). So fh_to_dentry only saves you temporarily. I think adding read_inode2 early is the best. -Andi
Re: Reiserfs and NFS
On Tue, May 02, 2000 at 11:20:10PM +0200, Steve Dodd wrote: > On Tue, May 02, 2000 at 01:50:16PM -0700, Chris Mason wrote: > > > > ReiserFS has unique inode numbers, but they aren't enough to actually find > > the inode on disk. That requires the inode number, and another 32 bits of > > information we call the packing locality. The packing locality starts as > > the parent directory inode number, but does not change across renames. > > > > So, we need to add a fh_to_dentry lookup operation for knfsd to use, and > > perhaps a dentry_to_fh operation as well (but _fh_update in pre6 looks ok > > for us). > > First off, could we call them "inode labels" or something less confusing? > "file" outside of NFS has a different meaning (semantic namespace collision > ) Also, I don't see how a "fh_to_dentry" (or ilbl_to_dentry) is going to > work - (think hardlinks, etc.). You do need an iget_by_label or something NFS file handles are always a kind of hard link, in traditional Unix they refer directly to inodes. Linux knfsd uses dentries only because the VFS semantics require it, not because of any NFS requirements. > similar though. Details that need to be worked out would be max label size > and how they're passed around (void * ptr and a length?) iget_by_label() is already implemented in 2.3 -- see iget4(). Unfortunately it is a bit inefficient to search inodes this way [because you cannot index on the additional information], but not too bad. > > Also, what are the size constraints imposed by NFS? What about other network > filesystems? NFSv2 has 2GB limits for files. -Andi
Re: fs changes in 2.3
On Wed, 3 May 2000, Andi Kleen wrote: > On Wed, May 03, 2000 at 08:54:54AM +0200, Alexander Viro wrote: > > > > ObNFS: weird as it may sound to you, I actually write stuff - not > > "subcontract" to somebody else. So I'm afraid that I have slightly less > > free time than you do. FWIC, in Reiserfs context nfsd is a non-issue. > > Current kludge is not too lovely, but it's well-isolated and can be > > replaced fast. So ->read_inode2() is ugly, but in my opinion it's not an > > obstacle. If other problems will be resolved and by that time > > ->fh_to_dentry() interface will not be in place - count on my vote for > > temporary adding ->read_inode2(). > > In the long run some generic support for 64bit inodes will be needed > anyways -- other file systems depend on that too (e.g. XFS). So > fh_to_dentry only saves you temporarily. I think adding read_inode2 > early is the best. > The problem is the read_inode2 callers need to know what to send, so we need the opaque inode label idea, and an easy way to pass it around. Or, we can just decide 64 bits is enough, and add a 64 bit inode number into struct inode, keeping the 32 bit one for compatibility. -chris
Re: fs changes in 2.3
"Dunlap, Randy" wrote: > The thing to do is one of the things that Linus does > best IMO, which is to lead by example. Show us the > code, or in this case, show us the docs. I am not sure you heard what I said precisely. I am saying to my programmers "code not suck", and then saying Viro is worse than useless, and sucking hasn't gotten Chris NFS fixes, so that is even more reason to code not suck. Andi Kleen found a race condition for us, and told us where it was. Andi is cool. Viro found unecessary checks in our code, and then made it sound like something big. He does this kind of thing a lot to everyone. He discourages every contributor he can for the betterment of his ego. He is damage in the Linux community to route around. Hans
[RFC] Possible design for "mount traps"
Folks, I've tried to describe the stuff that may IMO become useful for autofs/devfs/portalfs/etc. Comments are more than welcome. Current problems: 1. autofs would be much simpler if we had some way to distinguish between the real negative dentries and mountpoints-to-be-triggered. Especially autofs4. 2. It would be nice if we could treat all mounts in the tree created by autofs the same way. IOW, get the expiry stuff independently for all of them, not just for the layer closest to autofs. The main problem with that is the way we trigger such mounts - it's done in autofs4 ->lookup() and that means that once we got a mount deep in the tree expired there will be no way to get it back. Not nice. Thus the need to scan the whole tree from the autofs code, play the games with remounting stuff if expiry fails in the middle (somebody went into /mnt/net/foo while we were umounting /mnt/net/foo/bar and that made umount /mnt/net/foo fail; have to remount everything). Thus the problems with handling the policy stuff in the kernel (nobody should mount by hands anywhere under autofs), etc. 3. In principle, it would be nice if daemon told where the potential mountpoints are from the very beginning. However, we lack the object to hold such information. 4. All the stuff with triggering mounts/expiry attempts looks like a VFS fodder. For one thing, it's localized. It's a node in unified tree, not the whole chunk. autofs just happens to carry such points, but we would really win a lot if we allowed them anywhere. 5. Any schemes with automount-like stuff in devfs require (union-)mount being triggered if lookup brings negative in all components already mounted. IOW, if the search gets to the last component of union-mount. 6. portalfs is another example of notifications sent to userland upon a lookup reaching the mountpoint (the rest of name being passed to daemon which should return the object). So what about the following trick: let's allow vfsmounts without associated superblock and allow to "mount" them even on the negative dentries? Notice that the latter will not break walk_name() - checks for dentry being negative are done after we try to follow mounts. Notice also that once we mount something atop of such vfsmount it becomes completely invisible - it's wedged between two real objects and following mounts will walk through it without stopping. So the only case when these beasts count is when they are "mounted", but nothing is mounted atop of them. But that's precisely the class of situations we are interested in. In case of autofs we want follow_down() into such animal to trigger mounting, in case of portalfs - passing the rest of pathname to daemon, in case of devfs-with-automount we want to kick devfsd. So let them have a method that would be called upon such follow_down() (i.e. one when we have nothing mounted atop of us). And that's it. These objects are not filesystems - they rather look like a traps set in the unified tree. Notice that they do not waste anon device like "one node autofs" would do. That way if autofs daemon mounted /mnt/net/foo it would not follow up with /mnt/net/foo/bar - it would just set the trap in /mnt/net/foo/bar and let the actual lookups trigger further mounts. Notice that we can also remove a lot of special-case code from autofs: just make the lookup in mount() _not_ follow mounts on the last step and there we are - mount(8) can be called by autofs daemon without a black magic currently needed to prevent recursive invoking the mount upon attempt to look the mountpoint up. The bottom line: I propose to a) remove ->mnt_dev from struct vfsmount (not used by any code in the tree, it's a rudiment of old scheme where we used to search by device number). b) add a method (say it, ->mnt_trigger()) that would be called when follow_down() finds vfsmount with NULL ->mnt_sb and nothing mounted on that vfsmount. c) add a way (flag to mount, whatever) for mounting such "traps". Removal is probably best done by umount(). d) add LOOKUP_NO_DOWN - takes effect only on the last component, prevents calls of follow_down(). Used by lookup of mountpoint in mount(). e) see what can be simplified in the autofs4 and possibly devfs if we use such "traps". I can do that and I suspect that it may seriously simplify the life. If anyone has objections/sees obvious holes in the scheme - I'ld like to hear about that. Jeremy, would you be OK with keeping the information about difference between regular negatives and mountpoints-to-be that way? Linus, what do you think about such beast? It's a seriously differnt way of doing autofs-like stuff, but I think that it's may be the Right Thing(tm). Would you accept such objects? Cheers, Al
RE: [RFC] Possible design for "mount traps"
On 03-May-2000 Alexander Viro wrote: > Thus the need to > scan the whole tree from the autofs code, play the games with remounting > stuff if expiry fails in the middle (somebody went into /mnt/net/foo while > we were umounting /mnt/net/foo/bar and that made umount /mnt/net/foo > fail; have to remount everything). This doesn't happen because the kernel code makes sure the umount is always possible once it told the daemon about it. There's some of the recovery code in the daemon, but it never gets run. > So what about the following trick: let's allow vfsmounts without > associated superblock and allow to "mount" them even on the negative > dentries? Notice that the latter will not break walk_name() - checks for > dentry being negative are done after we try to follow mounts. > Notice also that once we mount something atop of such vfsmount it > becomes completely invisible - it's wedged between two real objects and > following mounts will walk through it without stopping. This would be broadly useful for autofs, since its pretty much what's required to implement direct mounts. This would allow us to do incremental mount and expiry of individual filesystems in the tree without having to do them en mass like autofs4 currently does. It also means we don't need a special autofs4 filesystem mounted, because we can garnish the namespace without it. It doesn't hurt that direct mounts are about the #1 requested feature. I know hpa has been thinking about how to stack dentries. How does this compare? BTW, what happens if you umount a filesystem which has these scattered about its namespace? Do they get cleaned up as part of the umount (appropriate callback, etc), or do you need to clear them out before the umount? I prefer the former. Also, what happens if you attach one to a non-directory? Could you use it to put arbiary "special files" into the namespace without having to do anything special? It would make thinks like Pavel's podfuk more useful without having to do horrible namespace hacks as he does now. Also, when one is inserted between two real filesystems, it still needs to be able to mediate namespace lookups. Autofs may need this to block access to a filesystem while the daemon is umounting it. > These objects are not filesystems - they rather look like a traps > set in the unified tree. Notice that they do not waste anon device like > "one node autofs" would do. That's not a huge issue, since you run out pretty quickly with NFS's consumption. > Jeremy, would you be OK with keeping the information about > difference between regular negatives and mountpoints-to-be that way? I like it. I've been thinking about something pretty similar, so I pretty much know how I'd use it. J
Re: fs changes in 2.3
Hans Reiser wrote: > Andi Kleen found a race condition for us, and told us where it was. Andi is > cool. Viro found unecessary checks in our code, and then made it sound like > something big. He does this kind of thing a lot to everyone. He discourages > every contributor he can for the betterment of his ego. He is damage in the > Linux community to route around. Here's a lesson in life: if you don't want to deal with somebody, don't deal with them. Al Viro isn't the only one allowed to contribute VFS patches. Jeff -- Jeff Garzik | Nothing cures insomnia like the Building 1024| realization that it's time to get up. MandrakeSoft, Inc. |-- random fortune
Re: Reiserfs and NFS
On Wed, May 03, 2000 at 11:27:34AM +0200, Andi Kleen wrote: > On Tue, May 02, 2000 at 11:20:10PM +0200, Steve Dodd wrote: > > First off, could we call them "inode labels" or something less confusing? > > "file" outside of NFS has a different meaning (semantic namespace collision > > ) Also, I don't see how a "fh_to_dentry" (or ilbl_to_dentry) is going to > > work - (think hardlinks, etc.). You do need an iget_by_label or something > > NFS file handles are always a kind of hard link, in traditional Unix > they refer directly to inodes. Linux knfsd uses dentries only because the > VFS semantics require it, not because of any NFS requirements. What I meant was, you can't have a "ilabel_to_dentry" (or fh_to_dentry for that matter ) function because there may well be more than one dentry pointing to the inode. As for NFS's use of dentries, I'm still not sure I understand all the details. Without having reading the specs, I would expect it to be operating mostly on inodes, but I'm sure there are good reasons why it doesn't. [..] > iget_by_label() is already implemented in 2.3 -- see iget4(). Unfortunately > it is a bit inefficient to search inodes this way [because you cannot index > on the additional information], but not too bad. iget4 isn't quite the same -- you need to supply a "find actor" to compare the other parts of the inode identifier, which are fs-specific. knfsd wouldn't be able to supply a find actor for the underlying filesystem it was serving. > > Also, what are the size constraints imposed by NFS? What about other network > > filesystems? > > NFSv2 has 2GB limits for files. Sorry, I was thinking more of limits imposed on the size of the "file handle" / inode identifier..
RE: [RFC] Possible design for "mount traps"
On Wed, 3 May 2000, Jeremy Fitzhardinge wrote: > I know hpa has been thinking about how to stack dentries. How does this > compare? Orthogonal. IIRC, hpa wanted them as a way to do loopbacks. Well, as soon as tree scanning in autofs4 switches to new linkage/goes away[1] we are getting much cheaper way to do loopbacks without mucking with dentries. So the only stacking of any kind is in the mounpoint and you hardly can get out without that... There may be other applications of dentry stacking, but that's completely different story - these things are independent and stacking would be a serious overhead for autofs* needs. [1] I would really prefer the latter, but if it will be hard to do fast - fine, it will be switch to new linkage; I have that code. > BTW, what happens if you umount a filesystem which has these scattered about > its namespace? Do they get cleaned up as part of the umount (appropriate > callback, etc), or do you need to clear them out before the umount? I prefer > the former. Hrrrmmm... Probably the former, but I can argue it both ways ;-) > Also, what happens if you attach one to a non-directory? Could you use it to > put arbiary "special files" into the namespace without having to do anything > special? It would make thinks like Pavel's podfuk more useful without having > to do horrible namespace hacks as he does now. Ummm... I'm not sure that I like the idea. Reason: I'm very suspicious of the situations when file turns into directory and back. I never seen it done right and in all cases when it had been done it was full of nasty special cases, kludges, etc. Mostly on the userland side of things, BTW. If you can do it in clean way and nothing will break I'll be only glad about that. Mechanism itself doesn't care for the type of that stuff, so I have no objections on that side. Just a nasty gut feeling... > Also, when one is inserted between two real filesystems, it still needs to be > able to mediate namespace lookups. Autofs may need this to block access to a > filesystem while the daemon is umounting it. It depends. How much are you going to do with the filesystem before umount(8)? > > These objects are not filesystems - they rather look like a traps > > set in the unified tree. Notice that they do not waste anon device like > > "one node autofs" would do. > > That's not a huge issue, since you run out pretty quickly with NFS's > consumption. two times slower. At least something...
Re: fs changes in 2.3
Alexander Viro wrote: >I > have no problems with people who actually wrote it. The fact that they got > a slimeball as a business manager has nothing with the technical side of > story. I find the fact that you claim credit for others' work and > especially the way you do it disgusting at extreme, but it doesn't make > said work worse. Viro, throughout the whole initial development of ReiserFS I spent 15-20 hours a week arguing over every algorithm we used (and then I worked 40 hours a week earning the salary that paid everyone in russia working for me). I didn't always get every algorithm done my way, and half the time my way was the wrong way, and surely I learned, and surely I was completely unqualified (we all were), but if you think I played no technical role in the design of reiserfs you couldn't be more wrong. The idea to do the filesystem was mine. The idea to aggregate small files was mine. The idea to use B+trees rather than B-trees was mine, and was shoved down someone's protesting throat by me. Going on farther would be silly. We ALL threw every idea we had into the debate, so the list of ideas that weren't mine is also long. The guys who left left in part because Vladimir outcoded them all, and I told the head of the research center that if he didn't work as hard as Vladimir he wasn't going to get paid as much. You should understand that Vladimir was the most junior member of the research team, and this was a PhD who thought a PhD was something really impressive. That wasn't the major reason though. The larger part of it was that he wanted me to accept his algorithms and I told him that he would use mine or not get paid. It really bothered him to work under the direction of an American with no PhD who wanted to do all these things that weren't in any textbook and were surely wrong therefor. Hee Hee. It still makes me smile. If it concerns you that I don't credit them by name, I guess you weren't listening on the phone when their swedish VC backer who wanted me to sell ReiserFS to them and who was in the protective services business here in Russia, suggested they might hire a hundred researchers to swear in Russian Court that I had no role in creating the filesystem. That was the day the name changed from treefs. You certainly weren't there when they tried to make it very difficult for me to continue ReiserFS without selling it to them by forbidding Vladimir to help me on weekends with commenting their execrable code to the point that I could find the bugs. (He told them he'd leave the job they gave him in America and go back to Russia. They gave in, but eventually he went back to Russia anyway, to work on our project.) I really don't think that persons who leave a project and then do all in their power to choke it out of existence deserve any credit at all. In V4 all of their remaining code will be tossed, we have been getting rid of it in pieces, but it is time to rip the heart out of it. By the way, the swedish protective services guy then proceeded to lose all of the money of the Russian investors backing him (they were vodka factory and casino money). I think the major part of the money ($1 million if I remember right) went to some guys in the secret police who claimed to have an algorithm proving that P=NP, but would not disclose the algorithm because it was so valuable it needed to be kept secret. Said PhD working for me, who had some specialization in this area, did a formal evaluation, and encouraged the investment by the investor. I still wonder if some money went to him to help shape his opinion, but I'll never really know this. In sum, ReiserFS would have been completed faster if I had never met them. Debugging and tweaking their code rather than scrapping it and recoding by myself was a serious mistake of mine. It was one bug away from working for a very long time, and the performance was deeply depressing for a long time. Fortunately the thing that really affected performance was block allocation policy, and that was Vladimir's code so it was extremely easy to work with. The only good thing that came out of that experience was meeting Vladimir. That's a pretty damn good thing though, and maybe that alone was worth all of it. Hans PS I don't think any of the people I see you flame are guilty of doing anything more than trying to contribute to Linux. You could surely tell them what their bugs are without discouraging them from making more contributions in that manner that you do. Think about that. I hope this thread dies soon though.
RE: [RFC] Possible design for "mount traps"
On 03-May-2000 Alexander Viro wrote: > as tree scanning in autofs4 switches to new linkage/goes away[1] we are > > [1] I would really prefer the latter, but if it will be hard to do fast - > fine, it will be switch to new linkage; I have that code. I'll happily get rid of the tree scanning if there's a better way of doing the same thing. I don't want to change the basic mechanism of autofs4 right at the moment though. >> BTW, what happens if you umount a filesystem which has these scattered about >> its namespace? Do they get cleaned up as part of the umount (appropriate >> callback, etc), or do you need to clear them out before the umount? I >> prefer >> the former. > > Hrrrmmm... Probably the former, but I can argue it both ways ;-) Well, you could get the latter behaviour from the former simply by holding an extra reference and preventing the umount, but you can't simulate the former from the latter. >> Also, what happens if you attach one to a non-directory? Could you use it >> to put arbiary "special files" into the namespace without having to do >> anything special? It would make thinks like Pavel's podfuk more useful >> without having to do horrible namespace hacks as he does now. > > Ummm... I'm not sure that I like the idea. Reason: I'm very suspicious of > the situations when file turns into directory and back. I never seen it > done right and in all cases when it had been done it was full of nasty > special cases, kludges, etc. Mostly on the userland side of things, BTW. > If you can do it in clean way and nothing will break I'll be only glad > about that. Mechanism itself doesn't care for the type of that stuff, so I > have no objections on that side. Just a nasty gut feeling... Well, I don't see a good reason not to make a file respond to readdir. chdir and chroot currently prevent any-nondirectory from being current, so you need to have a more general notion of directory-ness to make them work on magic files. The other approach is to have a file act like a symlink under some circumstances, but I haven't thought that through properly. That's essentially what podfuk does at the moment, with its magic mapping to the /overlay tree. Then there's the cases where all you want is some ordinary-looking files with dynamic content. That doesn't involve overlaying any incompatible semantics; it just means you have a file in the namespace which isn't on the filesystem (I guess you could get the same effect with a filesystem which has a file as the top-level dentry, but I seem to remember that didn't work very well last time I tried it). >> Also, when one is inserted between two real filesystems, it still needs to >> be able to mediate namespace lookups. Autofs may need this to block access >> to a filesystem while the daemon is umounting it. > > It depends. How much are you going to do with the filesystem before > umount(8)? Probably not a lot, but I was thinking of something a bit more general than autofs. Actually, it would be nice to see transitions just to get a sense of how much use a filesystem is getting. > two times slower. At least something... I suppose. It would be nice find some replacement for the fake block devices for blockless filesystems. J
Re: [RFC] Possible design for "mount traps"
Alexander Viro writes: > Folks, I've tried to describe the stuff that may IMO become useful > for autofs/devfs/portalfs/etc. Comments are more than welcome. > > Current problems: > 5. Any schemes with automount-like stuff in devfs require > (union-)mount being triggered if lookup brings negative in all > components already mounted. IOW, if the search gets to the last > component of union-mount. I think you're referring here to a "split" devfs, where each driver exports a mini-devfs. In such an environment, your mount traps would probably be good. However, I don't think the mini-devfs idea is a good approach. There are good reasons for having a unified tree. For one thing, there is the issue of mounting /. For another, some drivers (i.e. cdrom) need linkages (not just symlinks) into other parts of the devfs namespace. Also, it would be hard (or impossible) for related drivers to share the same directory (i.e. SCSI subsystem). At the least, there would have to be more co-operation between drivers. Compare this to the current devfs implementation where things are fairly modular and independent. > So what about the following trick: let's allow vfsmounts without > associated superblock and allow to "mount" them even on the negative > dentries? Notice that the latter will not break walk_name() - checks for > dentry being negative are done after we try to follow mounts. > Notice also that once we mount something atop of such vfsmount it > becomes completely invisible - it's wedged between two real objects and > following mounts will walk through it without stopping. > So the only case when these beasts count is when they are > "mounted", but nothing is mounted atop of them. But that's precisely the > class of situations we are interested in. In case of autofs we want > follow_down() into such animal to trigger mounting, in case of portalfs - > passing the rest of pathname to daemon, in case of devfs-with-automount > we want to kick devfsd. So let them have a method that would be called > upon such follow_down() (i.e. one when we have nothing mounted atop of > us). And that's it. > These objects are not filesystems - they rather look like a > traps set in the unified tree. Notice that they do not waste anon > device like "one node autofs" would do. This sounds a lot like the fake inodes I proposed a couple of years ago to solve the autofs direct mount problem. Anyway, while these mount traps are a good thing, particularly for autofs, I don't think they're going to help simplify devfs (without castrating devfs and probably breaking the Linus-mandated namespace;-). Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED]
Re: Looking for cfs - caching filesystem
if anyone have the information about cfs please mail me: [EMAIL PROTECTED]