Re: Proposal for "proper" durable fsync() and fdatasync()
Jamie Lokier wrote: Jeff Garzik wrote: Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It seems far easier to make sync_blkdev() Do The Right Thing, and magically make all filesystems data-safe. Well, you need ordered metadata writes, barriers _and_ flushes with some filesystems. Merely writing all the data pages than issuing a drive cache flush won't Do The Right Thing with those filesystems - someone already mentioned Btrfs, where it won't. Oh certainly. That's why we have a VFS :) fsync for NFS will look quite different, too. But I agree that your suggestion would make a superb default, for filesystems which don't provide their own function. Yep. That would immediately cover a bunch of filesystems. It's not optimal even then. Devices: On a software RAID, you ideally don't want to issue flushes to all drives if your database did a 1 block commit entry. (But they probably use O_DIRECT anyway, changing the rules again). But all that can be optimised in generic VFS code eventually. It doesn't need filesystem assistance in most cases. My own idea is that we create a FLUSH command for blkdev request queues, to exist alongside READ, WRITE, and the current barrier implementation. Then FLUSH could be passed down through MD or DM. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for "proper" durable fsync() and fdatasync()
Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It seems far easier to make sync_blkdev() Do The Right Thing, and magically make all filesystems data-safe. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal for "proper" durable fsync() and fdatasync()
Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that journalling filesystems can force correct write ordering, using disk flush cache commands. After all that effort, I was very surprised to notice that Linux 2.6.x doesn't use that capability to ensure fsync() flushes the disk cache onto stable storage. It's surprising you are surprised, given that this [lame] fsync behavior has remaining consistently lame throughout Linux's history. [snip huge long proposal] Rather than invent new APIs, we should fix the existing ones to _really_ flush data to physical media. Linux should default to SAFE data storage, and permit users to retain the older unsafe behavior via an option. It's completely ridiculous that we default to an unsafe fsync. And [anticipating a common response from others] it is completely irrelevant that POSIX fsync(2) permits Linux's current behavior. The current behavior is unsafe. Safety before performance -- ESPECIALLY when it comes to storing user data. Regards, Jeff (Linux ATA driver dude) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS partition usage...
David Miller wrote: From: Chris Mason <[EMAIL PROTECTED]> Date: Tue, 12 Feb 2008 09:08:59 -0500 I've had requests to move the super down to 64k to make room for bootloaders, which may not matter for sparc, but I don't really plan on different locations for different arches. The Sun disk label sits in the first 512 bytes and the boot loader block sits in the second 512 bytes. I think leaving even more space is a good idea for several reasons. Yep. I chose 32K unused space in the prototype filesystem I wrote [1, 2.4 era]. I'm pretty sure I got that number from some other filesystem, maybe even some NTFS incarnation. It's just good practice to avoid the first and last "chunks" of a partition, FSVO chunk. Jeff [1] http://kernel.org/pub/linux/kernel/people/jgarzik/ibu/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Make cramfs little endian only
Linus Torvalds wrote: On Tue, 4 Dec 2007, Andi Drebes wrote: Perhaps I'm missing somehting, but I think for cramfs, unfortunately, there has to be this statement. The bitfields in the cramfs_inode structure cause some problems. I agree that bitfields can be painful, but they should likely be just rewritten to be accesses using actual masks and shifts. The thing is, bitfields aren't actually endianness safe *anyway*, in that a compiler may end up using a *different* bit order than the byte order. So you cannot really use bitfields reliably on things like that (although Linux has a notion of a "__[BIG|LITTLE]_ENDIAN_BITFIELD", if you really want to). Bitfields also generate lower-quality assembly than masks&shifts (typically more instructions using additional temporaries to accomplish the same thing), based on my own informal gcc testing. You would think gcc would be advanced enough to turn bitfield use into masks and shifts under the hood, but for whatever reason that often is not the case in kernel code. Due to the way they're used, bitfields make more difficult the common code pattern of setting several flags at once: (assuming 'foo', 'bar' and 'baz' are bitfields in a struct) pdev->foo = 1; pdev->bar = 0; pdev->baz = 1; versus flag_foo = (1 << 0); flag_bar = (1 << 1); flag_baz = (1 << 2); ... pdev->flags = flag_foo | flag_bar; And getting back on topic, I think "pdev->flags = cpu_to_le32(flag1|flag2)" is nicer than dealing with bitfields, when your data structures are fixed-endian. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
Robin Humble wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: It is my hope that you will put your skills towards a distributed filesystem :) Of the current solutions, GFS (currently in kernel) scales poorly, and NFS v4.1 is amazingly bloated and overly complex. I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. it's called Lustre. works well, scales well, is widely used, is GPL. sadly it's not in mainline. Lustre is tilted far too much towards high-priced storage, and needs improvement before it could be considered for mainline. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 06:32:11PM -0400, Jeff Garzik wrote: J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: NFSv4.1 adds to the fun, by throwing interoperability completely out the window. What parts are you worried about in particular? I'm not worried; I'm stating facts as they exist today (draft 13): NFS v4.1 does something completely without precedent in the history of NFS: the specification is defined such that interoperability is -impossible- to guarantee. pNFS permits private and unspecified layout types. This means it is impossible to guarantee that one NFSv4.1 implementation will be able to talk another NFSv4.1 implementation. No, servers are required to support ordinary nfs operations to the metadata server. At least, that's the way it was last I heard, which was a while ago. I agree that it'd stink (for any number of reasons) if you ever *had* to get a layout to access some file. Was that your main concern? I just sorta assumed you could fall back to the NFSv4.0 mode of operation, going through the metadata server for all data accesses. But look at that choice in practice: you can either ditch pNFS completely, or use a proprietary solution. The market incentives are CLEARLY tilted in favor of makers of proprietary solutions. But it's a poor choice (really little choice at all). Overall, my main concern is that NFSv4.1 is no longer an open architecture solution. The "no-pNFS or proprietary platform" choice merely illustrate one of many negative aspects of this architecture. One of NFS's biggest value propositions is its interoperability. To quote some Wall Street guys, "NFS is like crack. It Just Works. We love it." Now, for the first time in NFS's history (AFAIK), the protocol is no longer completely specified, completely known. No longer a "closed loop." Private layout types mean that it is _highly_ unlikely that any OS or appliance or implementation will be able to claim "full NFS compatibility." And when the proprietary portion of the spec involves something as basic as accessing one's own data, I consider that a fundamental flaw. NFS is no longer completely open. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote: J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. What exactly do you mean by "POSIX-only"? Don't bother supporting attributes, file modes, and other details not supported by POSIX. The prime example being NFSv4, which is larded down with Windows features. I am sympathetic Cutting those out may still leave you with something pretty complicated, though. Far less complicated than NFSv4.1 though (which is easy :)) NFSv4.1 adds to the fun, by throwing interoperability completely out the window. What parts are you worried about in particular? I'm not worried; I'm stating facts as they exist today (draft 13): NFS v4.1 does something completely without precedent in the history of NFS: the specification is defined such that interoperability is -impossible- to guarantee. pNFS permits private and unspecified layout types. This means it is impossible to guarantee that one NFSv4.1 implementation will be able to talk another NFSv4.1 implementation. Even if Linux supports the entire NFSv4.1 RFC (as it stands in draft 13 anyway), there is no guarantee at all that Linux will be able to store and retrieve data, since it's entirely possible that a proprietary protocol is required to access your data. NFSv4.1 is no longer a completely open architecture. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
J. Bruce Fields wrote: On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote: I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. What exactly do you mean by "POSIX-only"? Don't bother supporting attributes, file modes, and other details not supported by POSIX. The prime example being NFSv4, which is larded down with Windows features. NFSv4.1 adds to the fun, by throwing interoperability completely out the window. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
Evgeniy Polyakov wrote: Hi. I'm pleased to announce fourth release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages. This release includes new configuration interface (kernel connector over netlink socket) and number of fixes of various bugs found during move to it (in error path). Further TODO list includes: * implement optional saving of mirroring/linear information on the remote nodes (simple) * new redundancy algorithm (complex) * some thoughts about distributed filesystem tightly connected to DST (far-far planes so far) Homepage: http://tservice.net.ru/~s0mbre/old/?section=projects&item=dst Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> My thoughts. But first a disclaimer: Perhaps you will recall me as one of the people who really reads all your patches, and examines your code and proposals closely. So, with that in mind... I question the value of distributed block services (DBS), whether its your version or the others out there. DBS are not very useful, because it still relies on a useful filesystem sitting on top of the DBS. It devolves into one of two cases: (1) multi-path much like today's SCSI, with distributed filesystem arbitrarion to ensure coherency, or (2) the filesystem running on top of the DBS is on a single host, and thus, a single point of failure (SPOF). It is quite logical to extend the concepts of RAID across the network, but ultimately you are still bound by the inflexibility and simplicity of the block device. In contrast, a distributed filesystem offers far more scalability, eliminates single points of failure, and offers more room for optimization and redundancy across the cluster. A distributed filesystem is also much more complex, which is why distributed block devices are so appealing :) With a redundant, distributed filesystem, you simply do not need any complexity at all at the block device level. You don't even need RAID. It is my hope that you will put your skills towards a distributed filesystem :) Of the current solutions, GFS (currently in kernel) scales poorly, and NFS v4.1 is amazingly bloated and overly complex. I've been waiting for years for a smart person to come along and write a POSIX-only distributed filesystem. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 18/26] FS: ExtX filesystem defrag
Please add 'slab' to the title, otherwise you conflict with a feature of the same name... - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
Alex Tomas wrote: So without the ability to attach specific I/O completions to bios or support for unwritten extents directly in __mpage_writepage, there is no way XFS can use this "generic" delayed allocation code. I didn't say "generic", see Subject: :) Well, it shouldn't even be in the VFS layer if it's only usable by one filesystem. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
Alex Tomas wrote: Jeff Garzik wrote: Is this based on Christoph's work? Christoph, or some other XFS hacker, already did generic delalloc, modeled on the XFS delalloc code. nope, this one is simple (something I'd prefer for ext4). The XFS one is proven and the work was already completed. What were the specific technical issues that made it unsuitable for ext4? I would rather not reinvent the wheel, particularly if the reinvention is less capable than the existing work. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] basic delayed allocation in VFS
Alex Tomas wrote: Good day, please review ... thanks, Alex basic delayed allocation in VFS: * block_prepare_write() can be passed special ->get_block() which doesn't allocate blocks, but reserve them and mark bh delayed * a filesystem can use mpage_da_writepages() with other ->get_block() which doesn't defer allocation. mpage_da_writepages() finds all non-allocated blocks and try to allocate them with minimal calls to ->get_block(), then submit IO using __mpage_writepage() Signed-off-by: Alex Tomas <[EMAIL PROTECTED]> Is this based on Christoph's work? Christoph, or some other XFS hacker, already did generic delalloc, modeled on the XFS delalloc code. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
new ext4 build warnings
It seems jbd_debug() might need modification: fs/ext4/inode.c: In function ‘ext4_write_inode’: fs/ext4/inode.c:2906: warning: comparison is always true due to limited range of data type fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’: fs/jbd2/recovery.c:254: warning: comparison is always true due to limited range of data type fs/jbd2/recovery.c:257: warning: comparison is always true due to limited range of data type fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’: fs/jbd2/recovery.c:301: warning: comparison is always true due to limited range of data type I'm surprised this was not noticed in a test build before pushing upstream. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: *at syscalls for xattrs?
H. Peter Anvin wrote: Jeff Garzik wrote: What the *at() interfaces really do is fix/paper over a longstanding wart in Unix: the cwd really should have been a standard file descriptor (like stdin/stdout/stderr) instead of a magic piece of state maintained in kernel space. It's more than a wart, IMO. *at() allows one to close races (with potential security implications) that are otherwise impossible to close, in directory traversal. *at() permits a userspace program to hold proper references to all objects during a directory traversal, with all that implies. Well, as Jeremy pointed out, in the absence of threads you can do the same thing with fchdir(), however, that's much more of a hack. My posixutils project (coreutils replacement) used fchdir(2), but that still doesn't get you 100% race-free. It gets you close, yes. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: *at syscalls for xattrs?
H. Peter Anvin wrote: Miklos Szeredi wrote: The *at() thing basically gives you the advantages of a CWD without the disadvantages. For example it could be useful to implement the functionality of find(1) as a library interface. What the *at() interfaces really do is fix/paper over a longstanding wart in Unix: the cwd really should have been a standard file descriptor (like stdin/stdout/stderr) instead of a magic piece of state maintained in kernel space. It's more than a wart, IMO. *at() allows one to close races (with potential security implications) that are otherwise impossible to close, in directory traversal. *at() permits a userspace program to hold proper references to all objects during a directory traversal, with all that implies. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] util-linux-ng 2.13-rc1
Gerd Hoffmann wrote: Jeff Garzik wrote: Christoph Hellwig wrote: And this is really dumb. autotools is a completely pain in the ass and not useful at all for linux-only tools. A myth. It is quite useful for packagers, because of the high Just Works(tm) factor. After porting an entire across several revisions of a distro, the autotools-based packages are the ones that work out of the box 90% of the time. And the 10% where it doesn't work it is a real pain to figure what goes wrong due to the completely unreadable Makefiles generated by autotools. After all they are not Makefiles, they are shellscripts embedded into Makefiles. The other 90% of _my_ time comes from annoying people who roll their own Makefile/build solution, which the packager has to then learn. Well, it's not *that* hard to write makefiles which follow the usual gnuish conventions, so stuff like "make DESTDIR=/tmp/buildroot install" works just fine. That isn't a reason to use autotools. Especially as people get that wrong *even with* autotools from time to time ... It's not _just_ makefiles, though. Packaging systems know what to do with configure scripts, and automatically plug that into their systems, e.g. with rpm's %configure, %make_install, etc. Having ported an entire distro, the time savings with autotools [OR ANOTHER STANDARD BUILD/CONFIGURE SYSTEM] are very real. Similarly, the time sink with each project doing its own home-rolled build/configure system is also very real. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] util-linux-ng 2.13-rc1
Christoph Hellwig wrote: On Wed, Jul 04, 2007 at 12:11:56AM +0200, Karel Zak wrote: The package build system is now based on autotools. The build system supports separate CFLAGS and LDFLAGS for suid programs (SUID_CFLAGS, SUID_LDFLAGS). For more details see the README file And this is really dumb. autotools is a completely pain in the ass and not useful at all for linux-only tools. A myth. It is quite useful for packagers, because of the high Just Works(tm) factor. After porting an entire across several revisions of a distro, the autotools-based packages are the ones that work out of the box 90% of the time. The other 90% of _my_ time comes from annoying people who roll their own Makefile/build solution, which the packager has to then learn. It's just not scalable for people to keep building their own build solutions. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
Christoph Hellwig wrote: On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote: - In line with the above item, filesystem block allocation is performed before a page is dirtied. In the buffer layer, mmap writes can dirty a page with no backing blocks which is a problem if the filesystem is ENOSPC (patches exist for buffer.c for this). This raises an eyebrow... The handling of ENOSPC prior to mmap write is more an ABI behavior, so I don't see how this can be fixed with internal changes, yet without changing behavior currently exported to userland (and thus affecting code based on such assumptions). Not really, the current behaviour is a bug. And it's not actually buffer layer specific - XFS now has a fix for that bug and it's generic enough that everyone could use it. I'm not sure I follow. If you require block allocation at mmap(2) time, rather than when a page is actually dirtied, you are denying userspace the ability to do sparse files with mmap. A quick Google readily turns up people who have built upon the mmap-sparse-file assumption, and I don't think we want to break those assumptions as a "bug fix." Where is the bug? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6][TAKE5] fallocate system call
Theodore Tso wrote: I don't think we have a problem here. What we have now is fine, and It's fine for ext4, but not the wider world. This is a common problem created by parallel development when code dependencies exist. In any case, the plan is to push all of the core bits into Linus tree for 2.6.22 once it opens up, which should be Real Soon Now, it looks like. Presumably you mean 2.6.23. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6][TAKE5] fallocate system call
Andrew Morton wrote: b) We do what we normally don't do and reserve the syscall slots in mainline. If everyone agrees it's going to happen... why not? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
Nick Piggin wrote: - No deadlocks (hopefully). The buffer layer is technically deadlocky by design, because it can require memory allocations at page writeout-time. It also has one path that cannot tolerate memory allocation failures. No such problems for fsblock, which keeps fsblock metadata around for as long as a page is dirty (this still has problems vs get_user_pages, but that's going to require an audit of all get_user_pages sites. Phew). - In line with the above item, filesystem block allocation is performed before a page is dirtied. In the buffer layer, mmap writes can dirty a page with no backing blocks which is a problem if the filesystem is ENOSPC (patches exist for buffer.c for this). This raises an eyebrow... The handling of ENOSPC prior to mmap write is more an ABI behavior, so I don't see how this can be fixed with internal changes, yet without changing behavior currently exported to userland (and thus affecting code based on such assumptions). - An inode's metadata must be tracked per-inode in order for fsync to work correctly. buffer contains helpers to do this for basic filesystems, but any block can be only the metadata for a single inode. This is not really correct for things like inode descriptor blocks. fsblock can track multiple inodes per block. (This is non trivial, and it may be overkill so it could be reverted to a simpler scheme like buffer). hrm; no specific comment but this seems like an idea/area that needs to be fleshed out more, by converting some of the more advanced filesystems. - Large block support. I can mount and run an 8K block size minix3 fs on my 4K page system and it didn't require anything special in the fs. We can go up to about 32MB blocks now, and gigabyte+ blocks would only require one more bit in the fsblock flags. fsblock_superpage blocks are > PAGE_CACHE_SIZE, midpage ==, and subpage <. definitely useful, especially if I rewrite my ibu filesystem for 2.6.x, like I've been planning. So. Comments? Is this something we want? If yes, then how would we transition from buffer.c to fsblock.c? Your work is definitely interesting, but I think it will be even more interesting once ext2 (w/ dir in pagecache) and ext3 (journalling) are converted. My gut feeling is that there are several problem areas you haven't hit yet, with the new code. Also, once things are converted, the question of transitioning from buffer.c will undoubtedly answer itself. That's the way several of us handle transitions: finish all the work, then look with fresh eyes and conceive a path from the current code to your enhanced code. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ext4: fallocate support in ext4
Andreas Dilger wrote: My comment was just that the extent doesn't have to be explicitly zero filled on the disk, by virtue of the fact that the uninitialized flag will cause reads to return zero. Agreed, thanks for the clarification. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] ext4: fallocate support in ext4
Andreas Dilger wrote: On May 07, 2007 13:58 -0700, Andrew Morton wrote: Final point: it's fairly disappointing that the present implementation is ext4-only, and extent-only. I do think we should be aiming at an ext4 bitmap-based implementation and an ext3 implementation. Actually, this is a non-issue. The reason that it is handled for extent-only is that this is the only way to allocate space in the filesystem without doing the explicit zeroing. For other filesystems (including ext3 and Precisely /how/ do you avoid the zeroing issue, for extents? If I posix_fallocate() 20GB on ext4, it damn well better be zeroed, otherwise the implementation is broken. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
[EMAIL PROTECTED] wrote: YOU GUYS WILL LAUGH ABOUT THIS: Yes, we are laughing at you. You keep using bonnie++ after being told it's a poor benchmark. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reiser4. BEST FILESYSTEM EVER.
David H. Lynch Jr wrote: Jeff Garzik wrote: David H. Lynch Jr wrote: I'm arguing against circular logic: the claim that one cannot determine reiser4's true usefulness unless its in the tree. The better method is to get a distro to add reiser4, _then_ if it proves worthy add it to the kernel tree. Not the other way around. And is that how other filesystems made it into the tree ? In the case of most major filesystems, yes. Distros are a proving ground for new stuff, not the upstream kernel. I regularly see drivers with very little in the way of testing go straight nearly straight into the tree - without even getting tagged as experimental. Hardware drivers are vastly different from filesystem drivers. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REISER4 FOR INCLUSION IN THE LINUX KERNEL.
[EMAIL PROTECTED] wrote: REISER4 FOR INCLUSION IN THE LINUX KERNEL. Dave Lynch takes a reasoned approach to REISER4. Dave Lynch wrote: Jeff Garzik wrote: If the compelling reason is that it needs a test, I'd say its not ready. Can you please elaborate ? I am not sure I understand what you are arguing ? Jeff Garzik is "saying" that he wants REISER4 to stay out of the main kernel, for reasons he is not willing to tell you. False. I have told you the reasons. I for one would at least play with it if it were in the distribution tree. I AM SURE THERE ARE A HUGE NUMBER OF PEOPLE WHO WOULD GIVE IT A TRY. You can download it now. Nobody is stopping you, or anyone else. As far as I could tell Hans pretty much everything else that was demanded. Hans eventually caved and provided - albeit with much pissing and moaning, and holy than thou rhetoric. It was not his pissing and moaning, etc,... these were just excuses to keep REISER4 from succeeding. The truth is, that any excuse would do. The real reasons are financial and backed by big money (sometimes, big egos). Put down the conspiracy crackpipe. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reiser4. BEST FILESYSTEM EVER.
David H. Lynch Jr wrote: Jeff Garzik wrote: If the compelling reason is that it needs a test, I'd say its not ready. Can you please elaborate ? I am not sure I understand what you are arguing ? Despite his substantially less than polite rhetoric, I have read Hans's post from months if not years ago. Aside from the pissing contests - which where not entirely one sided, I actually beleive that Hans made a reasonable case that Reiser4 had gone about as far as it could reasonably go with regard to testing, robustness, ... without the broader base of use that even an experimental filesystem in distribution tree would get. I for one would atleast play with it if it were in the distribution tree. As far as I could tell pretty much everything else that was demanded Hans eventually caved and provided - albeit with much pissing and moaning, and holy than thou rhetoric. The argument that anything that needs testing can't get into the distribution tree's is specious. There is alot of poorly tested crap in the distribution trees. I'm arguing against circular logic: the claim that one cannot determine reiser4's true usefulness unless its in the tree. The better method is to get a distro to add reiser4, _then_ if it proves worthy add it to the kernel tree. Not the other way around. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reiser4. BEST FILESYSTEM EVER.
David H. Lynch Jr wrote: I do care about getting Reiser4 into the kernel so that it can actually get a real test, and frankly do not see any compelling reason that should not happen. If the compelling reason is that it needs a test, I'd say its not ready. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Douglas Gilbert wrote: Bryan Henderson wrote: What is an odd-aligned disk? s/disk/partition/ ? Example: An odd-aligned disk in the 512-b logical / 1K-physical scenario is where odd LBAs indicate the start of a 1K physical sector. An even-aligned disk is where even LBAs indicate the start of a 1K physical sector. In order to avoid too many RMW cycles, partition software SHOULD (using IETF language) be aware of the underlying physical sector size alignment, in order to align paritions for optimal performance. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Christoph Hellwig wrote: the occasional 2k sector SCSI MO device aswell. It would be nice to get samples of large sector size ATA devices into the hands of developers to do real world testing of the whole stack. "hands of developers" meaning you specifically? :) I've had a 512b-logical/1K-physical ATA test drive for a few months now, and another couple arrived today. Hopefully people can parse what I've been posting, since I cannot give out raw numbers or data at this time. Of course, with RMW drives that leave the 512-b logical interface untouched, I had expected that they would Just Work(tm) and that is pretty much what happened. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Jan Engelhardt wrote: On Mar 11 2007 22:45, Ric Wheeler wrote: Jan Engelhardt wrote: On Mar 11 2007 18:51, Ric Wheeler wrote: During the recent IO/FS workshop, we spoke briefly about the coming change to a 4k sector size for disks on linux. If I recall correctly, the general feeling was that the impact was not significant since we already do most file system IO in 4k page sizes and should be fine as long as we partition drives correctly and avoid non-4k aligned partitions. Sorry about jumping right in, but what about an 'old-style' partition table that relies on 512 as a unit? I think that the normal case would involve new drives which would need to be partitioned in 4k aligned partitions. Shouldn't that work regardless of the unit used in the partition table? Assume this partition table on my current HD: Disk /dev/hdc: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Start End Blocks Id System /dev/hdc1 1 33 265041 82 Linux swap / Solaris /dev/hdc2 34 30515 2448466655 Extended That is, 255 * 63 * 30515 * 512 == roughly 251 GB. Now, if this disk was copied byte per byte (/bin/dd) to a 4096-based disk, and Linux would start using a sector size of 4096, then I would suddenly have 255 * 63 * 30515 * 4096 == 2 TB Although I would not mind the 2 TB, the partition table would read quite differently (note the Blocks column which is multiplied by 4 (512x4=4096)) At this level, for RMW drives, nothing changes. The partition software, ATA driver, and all other bits continue to think that sector size == 512 bytes. The partition software /hopefully/ becomes smart enough to understand the alignment necessary, but that is not a requirement. This is the key to understanding the difference between a physical (==platters) sector size change without a logical (==ATA interface) sector size change. Device Start End Blocks Id System /dev/hdc1 1 33 1060164 82 Linux swap / Solaris /dev/hdc2 34 30515 9793866605 Extended Which would mean that the swap partition reaches into the real data partition and would corrupt it. For RMW drives, RMW cycles would occur but not corruption. For non-RMW drives, this just wouldn't occur. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Jan Engelhardt wrote: On Mar 11 2007 18:51, Ric Wheeler wrote: During the recent IO/FS workshop, we spoke briefly about the coming change to a 4k sector size for disks on linux. If I recall correctly, the general feeling was that the impact was not significant since we already do most file system IO in 4k page sizes and should be fine as long as we partition drives correctly and avoid non-4k aligned partitions. Sorry about jumping right in, but what about an 'old-style' partition table that relies on 512 as a unit? For 1K/4K physical sector size, where logical sector size remains 512-b, nothing changes. DOS partitions start partitions on odd-numbered sectors, so presuming you have odd-aligned disks, life is good. For 1K/4K logical sector sizes, who knows. EFI? Certainly seems incompatible with the current popular DOS partition format. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Alan Cox wrote: First generation of 1K sector drives will continue to use the same 512-byte ATA sector size you are familiar with. A single 512-byte write will cause the drive to perform a read-modify-write cycle. This configuration is physical 1K sector, logical 512b sector. The problem case is "read-modify-screwup" At that point we've trashed the block we were writing (a well studied recovery case), and we've blasted some previously sane, totally unrelated sector of data out of existance. Thats why we need to know ideally if they are doing the write to a different physical block when they do this, so that we don't lose the old data. My guess is they won't as it'll be hard. Strict ATA command set answer: you will have no idea what goes on under the hood. The current 512-b interface stays /exactly/ the same, save for a word or two in IDENTIFY DEVICE telling you the "secret" physical sector size. If all your I/Os are aligned properly, then you need not worry about RMW cycles, as they will not occur. Intuition answer: they will use their firmware-internal standard code for scheduling reads and writes, and will only reallocate sectors as needed by media failure or similar events. The "M" part of the modify cycle happens in disk ram. So from the disk's point of view, a single 512-b write would require reading a single 1K hard sector, updating the contents in cache RAM, and then writing a single 1K hard sector. The reading of the unknown half of the sector can be scheduled well in advance, usually, since writeback caching gives the drive plenty of time (relatively speaking) to optimize things. Overall, it definitely adds a few more points of failure, but we can't do much at all about those points of failure. In my own experiments on my own Fedora workstation, ~66% of IOs in Linux start on an odd sector, and ~33% started on even-numbered sectors. For a 1K-sector drive with 'odd' alignment, the configuration Microsoft will likely want, that means the majority of disk transactions will avoid a RMW cycle, but a still-numerous minority will not. I did not test transfer length, to see how many transfers /ended/ on an odd sector, thus determining how many RMW cycles the tail of an average I/O requires. A future configuration will change the logical ATA interface away from 512-byte sectors to 1K or 4K. Here, it is impossible to read a quantity smaller than 1K or 4K, whatever the sector size is. That one I'm not worried about - other than "guess how Redmond decide to make partition tables work" that one is mostly easy (be fun to see how many controllers simply can't cope with the command formats) Indeed... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: impact of 4k sector size on the IO & FS stack
Alan Cox wrote: I would be interested to know what the disk vendors intend to use as their strategy when (with ATA) they have a 512 byte write from an older file system/setup into a 4K block. The case where errors magically appear Well, you have logical and physical sector size changes. First generation of 1K sector drives will continue to use the same 512-byte ATA sector size you are familiar with. A single 512-byte write will cause the drive to perform a read-modify-write cycle. This configuration is physical 1K sector, logical 512b sector. A future configuration will change the logical ATA interface away from 512-byte sectors to 1K or 4K. Here, it is impossible to read a quantity smaller than 1K or 4K, whatever the sector size is. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Heads up on sys_fallocate()
Amit K. Arora wrote: This is to give a heads up on few patches that we will be soon coming up with. These patches implement a new system call sys_fallocate() and a new inode operation "fallocate", for persistent preallocation. The new system call, as Andrew suggested, will look like: asmlinkage long sys_fallocate(int fd, loff_t offset, loff_t len); As we are developing and testing the required patches, we decided to post a preliminary patch and get inputs from the community to give it a right direction and shape. First, a little description on the feature. Persistent preallocation is a file system feature using which an application (say, relational database servers) can explicitly preallocate blocks to a particular file. This feature can be used to reserve space for a file to get mainly the following benefits: 1> contiguity - less defragmentation and thus faster access speed, and 2> guarantee for a minimum space availibility (depending on how many blocks were preallocated) for the file, even if the filesystem becomes full. XFS already has an implementation for this, using an ioctl interface. And, ext4 is now coming up with this feature. In coming time we may see a few more file systems implementing this. Thus, it makes sense to have a more standard interface for this, like this new system call. Here is the initial and incomplete version of the patch, which can be used for the discussion, till we come up with a set of more complete patches. --- arch/i386/kernel/syscall_table.S |1 + fs/ext4/file.c |1 + fs/open.c| 18 ++ include/asm-i386/unistd.h|3 ++- include/linux/fs.h |1 + include/linux/syscalls.h |1 + 6 files changed, 24 insertions(+), 1 deletion(-) I certainly agree that we want something like this. posix_fallocate() is the glibc interface we want to be compatible with (which your definition is, AFAICS). Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: end to end error recovery musings
Theodore Tso wrote: Can someone with knowledge of current disk drive behavior confirm that for all drives that support bad block sparing, if an attempt to write to a particular spot on disk results in an error due to bad media at that spot, the disk drive will automatically rewrite the sector to a sector in its spare pool, and automatically redirect that sector to the new location. I believe this should be always true, so presumably with all modern disk drives a write error should mean something very serious has happend. This is what will /probably/ happen. The drive should indeed find a spare sector and remap it, if the write attempt encounters a bad spot on the media. However, with a large enough write, large enough bad-spot-on-media, and a firmware programmed to never take more than X seconds to complete their enterprise customers' I/O, it might just fail. IMO, somewhere in the kernel, when we receive a read-op or write-op media error, we should immediately try to plaster that area with small writes. Sure, if it's a read-op you lost data, but this method will maximize the chance that you can refresh/reuse the logical sectors in question. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[git patch, resend] remove JFFS v1
[just sent this upstream; obvious file-removal patch snipped for size] (resend) Why:Unmaintained for years, superceded by JFFS2 for years. Please pull from 'kill-jffs' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git kill-jffs to receive the following updates: Documentation/feature-removal-schedule.txt |7 - fs/Kconfig | 26 - fs/Makefile|1 - fs/jffs/Makefile | 11 - fs/jffs/inode-v23.c| 1847 --- fs/jffs/intrep.c | 3449 fs/jffs/intrep.h | 58 - fs/jffs/jffs_fm.c | 798 --- fs/jffs/jffs_fm.h | 149 -- fs/jffs/jffs_proc.c| 261 --- fs/jffs/jffs_proc.h| 28 - include/linux/jffs.h | 224 -- 12 files changed, 0 insertions(+), 6859 deletions(-) delete mode 100644 fs/jffs/Makefile delete mode 100644 fs/jffs/inode-v23.c delete mode 100644 fs/jffs/intrep.c delete mode 100644 fs/jffs/intrep.h delete mode 100644 fs/jffs/jffs_fm.c delete mode 100644 fs/jffs/jffs_fm.h delete mode 100644 fs/jffs/jffs_proc.c delete mode 100644 fs/jffs/jffs_proc.h delete mode 100644 include/linux/jffs.h Jeff Garzik (1): Remove JFFS (version 1), as scheduled. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index c585aa8..e1bc0c5 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -306,13 +306,6 @@ Who: Len Brown <[EMAIL PROTECTED]> --- -What: JFFS (version 1) -When: 2.6.21 -Why: Unmaintained for years, superceded by JFFS2 for years. -Who: Jeff Garzik <[EMAIL PROTECTED]> - - What: sk98lin network driver When: July 2007 Why:In kernel tree version of driver is unmaintained. Sk98lin driver diff --git a/fs/Kconfig b/fs/Kconfig index a722b5a..3c4886b 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1189,32 +1189,6 @@ config EFS_FS To compile the EFS file system support as a module, choose M here: the module will be called efs. -config JFFS_FS - tristate "Journalling Flash File System (JFFS) support" - depends on MTD && BLOCK && BROKEN - help - JFFS is the Journalling Flash File System developed by Axis - Communications in Sweden, aimed at providing a crash/powerdown-safe - file system for disk-less embedded devices. Further information is - available at (<http://developer.axis.com/software/jffs/>). - - NOTE: This filesystem is deprecated and is scheduled for removal in - 2.6.21. See Documentation/feature-removal-schedule.txt - -config JFFS_FS_VERBOSE - int "JFFS debugging verbosity (0 = quiet, 3 = noisy)" - depends on JFFS_FS - default "0" - help - Determines the verbosity level of the JFFS debugging messages. - -config JFFS_PROC_FS - bool "JFFS stats available in /proc filesystem" - depends on JFFS_FS && PROC_FS - help - Enabling this option will cause statistics from mounted JFFS file systems - to be made available to the user in the /proc/fs/jffs/ directory. - config JFFS2_FS tristate "Journalling Flash File System v2 (JFFS2) support" select CRC32 diff --git a/fs/Makefile b/fs/Makefile index b9ffa63..9edf411 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -94,7 +94,6 @@ obj-$(CONFIG_HPFS_FS) += hpfs/ obj-$(CONFIG_NTFS_FS) += ntfs/ obj-$(CONFIG_UFS_FS) += ufs/ obj-$(CONFIG_EFS_FS) += efs/ -obj-$(CONFIG_JFFS_FS) += jffs/ obj-$(CONFIG_JFFS2_FS) += jffs2/ obj-$(CONFIG_AFFS_FS) += affs/ obj-$(CONFIG_ROMFS_FS) += romfs/ [snip file deletion patch] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[git patch] remove jffs (v1)
Please pull from 'kill-jffs' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git kill-jffs to receive the following updates: Documentation/feature-removal-schedule.txt |7 - fs/Kconfig | 26 - fs/Makefile|1 - fs/jffs/Makefile | 11 - fs/jffs/inode-v23.c| 1847 --- fs/jffs/intrep.c | 3449 fs/jffs/intrep.h | 58 - fs/jffs/jffs_fm.c | 798 --- fs/jffs/jffs_fm.h | 149 -- fs/jffs/jffs_proc.c| 261 --- fs/jffs/jffs_proc.h| 28 - include/linux/jffs.h | 224 -- 12 files changed, 0 insertions(+), 6859 deletions(-) delete mode 100644 fs/jffs/Makefile delete mode 100644 fs/jffs/inode-v23.c delete mode 100644 fs/jffs/intrep.c delete mode 100644 fs/jffs/intrep.h delete mode 100644 fs/jffs/jffs_fm.c delete mode 100644 fs/jffs/jffs_fm.h delete mode 100644 fs/jffs/jffs_proc.c delete mode 100644 fs/jffs/jffs_proc.h delete mode 100644 include/linux/jffs.h Jeff Garzik (1): Delete JFFS (version 1), as scheduled. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 0ba6af0..fc53239 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -318,10 +318,3 @@ Why: /proc/acpi/button has been replaced by events to the input layer Who: Len Brown <[EMAIL PROTECTED]> --- - -What: JFFS (version 1) -When: 2.6.21 -Why: Unmaintained for years, superceded by JFFS2 for years. -Who: Jeff Garzik <[EMAIL PROTECTED]> - diff --git a/fs/Kconfig b/fs/Kconfig index 8cd2417..67a50c9 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1196,32 +1196,6 @@ config EFS_FS To compile the EFS file system support as a module, choose M here: the module will be called efs. -config JFFS_FS - tristate "Journalling Flash File System (JFFS) support" - depends on MTD && BLOCK && BROKEN - help - JFFS is the Journalling Flash File System developed by Axis - Communications in Sweden, aimed at providing a crash/powerdown-safe - file system for disk-less embedded devices. Further information is - available at (<http://developer.axis.com/software/jffs/>). - - NOTE: This filesystem is deprecated and is scheduled for removal in - 2.6.21. See Documentation/feature-removal-schedule.txt - -config JFFS_FS_VERBOSE - int "JFFS debugging verbosity (0 = quiet, 3 = noisy)" - depends on JFFS_FS - default "0" - help - Determines the verbosity level of the JFFS debugging messages. - -config JFFS_PROC_FS - bool "JFFS stats available in /proc filesystem" - depends on JFFS_FS && PROC_FS - help - Enabling this option will cause statistics from mounted JFFS file systems - to be made available to the user in the /proc/fs/jffs/ directory. - config JFFS2_FS tristate "Journalling Flash File System v2 (JFFS2) support" select CRC32 diff --git a/fs/Makefile b/fs/Makefile index b9ffa63..9edf411 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -94,7 +94,6 @@ obj-$(CONFIG_HPFS_FS) += hpfs/ obj-$(CONFIG_NTFS_FS) += ntfs/ obj-$(CONFIG_UFS_FS) += ufs/ obj-$(CONFIG_EFS_FS) += efs/ -obj-$(CONFIG_JFFS_FS) += jffs/ obj-$(CONFIG_JFFS2_FS) += jffs2/ obj-$(CONFIG_AFFS_FS) += affs/ obj-$(CONFIG_ROMFS_FS) += romfs/ [snip obvious diff that deletes fs/jffs/* and include/linux/jffs.h] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[git patch] mention JFFS impending death
JFFS is already marked CONFIG_BROKEN in fs/Kconfig, with a note that it's going away in 2.6.21, but the corresponding update to feature-removal-schedule.txt was accidentally omitted. Fixed. Please pull from 'kill-jffs-prep' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git kill-jffs-prep to receive the following updates: Documentation/feature-removal-schedule.txt |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) Jeff Garzik (1): Note that JFFS (v1) is to be deleted, in feature-removal-schedule.txt diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index fc53239..0ba6af0 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -318,3 +318,10 @@ Why: /proc/acpi/button has been replaced by events to the input layer Who: Len Brown <[EMAIL PROTECTED]> --- + +What: JFFS (version 1) +When: 2.6.21 +Why: Unmaintained for years, superceded by JFFS2 for years. +Who: Jeff Garzik <[EMAIL PROTECTED]> + +--- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take32 0/10] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: On Wed, Jan 10, 2007 at 06:11:26AM -0500, Jeff Garzik ([EMAIL PROTECTED]) wrote: Once the rate of change slows, Andrew should IMO definitely pick this up. There are _tons_ of ideas to implement with kevent - so if we want, rate will not slow down. As you can see, from take26 I only send new features: signals, posix timers, AIO, userspace notifications, various flags and the like. I test it on my machines (recently one them died, so only amd64 right now (running kernel) and i386 compile-only) and some bug-fixes withoout any additioanl feature requests (almost, Ingo asked for AIO before New Year), but broader testing is welcome indeed. If the rate doesn't slow (if only artificially), people are discouraged from reviewing, because it becomes a moving target. If you wanted to make this process automatic, create a git branch that Andrew and others can pull. Exported git tree would be good, but I do not have enough disk space on Request an account on http://www.foo-projects.org/ which supports git. The Intel guys use it to send me e1000/ixgb changes, for example. web-site, and do you really want to read comments written in bad english with russian transliterated indecent words? The only thing exported to -mm is the code changes, as a patch. git merely automates the process, so that Andrew doesn't have to spend time [that he doesn't have] tracking a project with a high rate of change. I like the direction so far, and think it should be in -mm for wider testing and review. It was there, but Andrew dropped it somewhere about take25 :) Probably because it was a moving target with a high rate of change, requiring time that Andrew did not have just to keep in sync and fix build conflicts with other -mm patches. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take32 0/10] kevent: Generic event handling mechanism.
Evgeniy Polyakov wrote: Generic event handling mechanism. Kevent is a generic subsytem which allows to handle event notifications. It supports both level and edge triggered events. It is similar to poll/epoll in some cases, but it is more scalable, it is faster and allows to work with essentially eny kind of events. Events are provided into kernel through control syscall and can be read back through ring buffer or using usual syscalls. Kevent update (i.e. readiness switching) happens directly from internals of the appropriate state machine of the underlying subsytem (like network, filesystem, timer or any other). Homepage: http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent Documentation page: http://linux-net.osdl.org/index.php/Kevent Consider for inclusion. With this release I start 3 days resending timeout - i.e. each third day I will send either new version (if something new was requested and agreed to be implemented) or resending with back counter started from three. When back counter hits zero after three resendings I consider there is no interest in subsystem and I will stop further sending. I really doubt it is a good way to tell the world about my work, and I bet you all tired from those pathos words, but I really would like to get some feedback, since I want to start to work on network AIO, but sending mails into unfeedbackable 'destination' really does not motivate me for that. Thanks for understanding and your time. Once the rate of change slows, Andrew should IMO definitely pick this up. If you wanted to make this process automatic, create a git branch that Andrew and others can pull. I like the direction so far, and think it should be in -mm for wider testing and review. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] Delete JFFS (version 1)
Bill Nottingham wrote: Jeff Garzik ([EMAIL PROTECTED]) said: It's always been the case that we remove Linux kernel code when the number of users (and more importantly, developers) drops to near-nil. So, drivers/net/3c501.c? Depends on how motivated Alan remains ;-) Historically, if the developer is active, we have occasionally ignored the miniscule userbase. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] Delete JFFS (version 1)
Jeff Garzik wrote: When it's more likely to get struck by lightning than encounter filesystem X on a random hard drive in the field, filesystem X need not be in the kernel. As people are already poking me:) I course meant "flash device" not "hard drive". SATA maintainer's curse, I suppose, to think of all storage devices as hard drives, no matter how incorrect that might be :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] Delete JFFS (version 1)
Josh Boyer wrote: On 12/12/06, Jeff Garzik <[EMAIL PROTECTED]> wrote: I have created the 'kill-jffs' branch of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git that removes fs/jffs. I argue that you can count the users (who aren't on 2.4) on one hand, and developers don't seem to have cared for it in ages. People are already talking about jffs2 replacements, so I propose we zap jffs in 2.6.21. I'm usually all for killing broken code, but JFFS isn't really broken is it? Is there some burden it's causing by being in the kernel at the moment? It's always been the case that we remove Linux kernel code when the number of users (and more importantly, developers) drops to near-nil. Every line of code is one more place you have to audit when code changes, one more place to update each time the VFS API is touched. When it's more likely to get struck by lightning than encounter filesystem X on a random hard drive in the field, filesystem X need not be in the kernel. IMO, of course :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] Delete JFFS (version 1)
I have created the 'kill-jffs' branch of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git that removes fs/jffs. I argue that you can count the users (who aren't on 2.4) on one hand, and developers don't seem to have cared for it in ages. People are already talking about jffs2 replacements, so I propose we zap jffs in 2.6.21. Jeff diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 46f2a55..c008303 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -270,3 +270,10 @@ Why: The new layer 3 independant connection tracking replaces the old Who: Patrick McHardy <[EMAIL PROTECTED]> --- + +What: JFFS (version 1) filesystem +When: 2.6.21 +Why: No users or developers +Who: Jeff Garzik <[EMAIL PROTECTED]> + +---
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Here's a dumb question, and I apologize if I am questioning computer science dogma... Why are LVM and EVMS(competing LVM project) needed at all? Surely the same can be accomplished with * md * snapshot blkdev (attached in previous e-mail) * giving partitions and blkdevs the ability to grow and shrink * giving filesystems the ability to grow and shrink On-line optimization (defrag, etc) shouldn't be hard once you have the ability to move blocks and files around, which would come with the ability to grow and shrink blkdevs and fs's. -- Jeff Garzik | "Do you have to make light of everything?!" Building 1024| "I'm extremely serious about nailing your MandrakeSoft | step-daughter, but other than that, yes." - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Linus Torvalds wrote: > There are some strong arguments that we should have filesystem > "backdoors" for maintenance purposes, including backup. I think I agree with something Al said over IRC, that fs-level snapshots are preferred over block level snapshots. fs-level snapshots should become easy if you have a generic transaction layer. The OS spits out file ops, which get processed into a set of fs transactions. (remember that fs-level stuff like "change this block bitmap" is also a transaction, just like the more generic "update this inode's mtime") Also, I think there should be generic block allocation strategies that fs's can use. Implementing fs-specific strategies such as ext2's readahead or XFS's delayed allocation is not a solution, IMHO, but working towards solving the real problem. > You can, of course, so parts of this on a LVM level, and doing backups > with "disk snapshots" may be a valid approach. However, even that is > debatable: there is very little that says that the disk image has to be > up-to-date at any particular point in time, so even with a disk snapshot > capability (which is not necessarily reasonable under all circumstances) > there are arguments for maintenance interfaces. I've been hacking on the attached, a snapshot block device driver, which doesn't require LVM at all. (warning: compiled and updated per outside review, but very alpha... do not apply) The point of the driver is to provide a sync point at snapshot time, at which all metadata and data is flushed to the block device. My question... is there a fundamental flaw in this plan? Ideally when userspace says "start snapshot", the fsync_dev occurs [a simplification]. At that point, userspace can safely run dump or tar or whatever on the virtual snapshot device. -- Jeff Garzik | "Do you have to make light of everything?!" Building 1024| "I'm extremely serious about nailing your MandrakeSoft | step-daughter, but other than that, yes." Index: linux_2_4/drivers/block/Config.in diff -u linux_2_4/drivers/block/Config.in:1.1.1.44 linux_2_4/drivers/block/Config.in:1.1.1.44.4.1 --- linux_2_4/drivers/block/Config.in:1.1.1.44 Tue May 15 04:43:24 2001 +++ linux_2_4/drivers/block/Config.in Wed May 16 15:44:59 2001 @@ -46,4 +46,6 @@ fi dep_bool ' Initial RAM disk (initrd) support' CONFIG_BLK_DEV_INITRD $CONFIG_BLK_DEV_RAM +tristate 'Snapshot device support' CONFIG_BLK_DEV_SNAP + endmenu Index: linux_2_4/drivers/block/Makefile diff -u linux_2_4/drivers/block/Makefile:1.1.1.46 linux_2_4/drivers/block/Makefile:1.1.1.46.4.1 --- linux_2_4/drivers/block/Makefile:1.1.1.46 Tue May 15 04:43:24 2001 +++ linux_2_4/drivers/block/MakefileWed May 16 15:44:59 2001 @@ -31,6 +31,7 @@ obj-$(CONFIG_BLK_DEV_DAC960) += DAC960.o obj-$(CONFIG_BLK_DEV_NBD) += nbd.o +obj-$(CONFIG_BLK_DEV_SNAP) += snap.o subdir-$(CONFIG_PARIDE) += paride Index: linux_2_4/drivers/block/snap.c diff -u /dev/null linux_2_4/drivers/block/snap.c:1.1.6.10 --- /dev/null Sat May 19 17:36:30 2001 +++ linux_2_4/drivers/block/snap.c Thu May 17 11:48:54 2001 @@ -0,0 +1,1055 @@ +/* + Copyright 2001 Jeff Garzik <[EMAIL PROTECTED]> + Copyright (C) 2000 Jens Axboe <[EMAIL PROTECTED]> + + May be copied or modified under the terms of the GNU General Public + License. See linux/COPYING for more information. + + Several ideas and some code taken from Jens Axboe's pktcdvd.c 0.0.2j. + + To-Do list: + * Write support. It's easy, and might be useful in isolated circumstances. + * Convert MAX_SNAPDEVS to a module parameter. + * Wrap use of "%" operator, to prepare for 64-bit-sized blockdevs on + 32-bit processors + + */ + +#define VERSION_CODE "v0.5.0-take6 17 May 2001 Jeff Garzik +<[EMAIL PROTECTED]>" +#define MODNAME"snap" +#define PFXMODNAME ": " +#define MAX_SNAPDEVS 16 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static int *snap_sizes; +static int *snap_blksize; +static int *snap_readahead; +static struct snap_device *snap_devs; +static int snap_major = -1; +static spinlock_t snap_lock = SPIN_LOCK_UNLOCKED; + + +/* + * a bit of a kludge, but we want to be able to pass source, log, + * or snap dev and get the right one. + */ +static struct snap_device *snap_find_dev(kdev_t dev) +{ + int i, j; + struct snap_device *sd; + + spin_lock(&snap_lock); + + for (i = 0; i < MAX_SNAPDEVS; i++) { + sd = &snap_devs[i]; + if ((sd->src.dev == dev) || (sd->snap_dev == dev)) + goto out; + for (j = 0; j < sd->n_logs; j++) + if (sd-&g
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Jeff Garzik wrote: > Notice also a "metadata miscdev" solves the problem of passing options > on open -- just pass those options to the miscdev before you open it... to be more clear, "it" == the data device, not the metadata miscdev -- Jeff Garzik | "Do you have to make light of everything?!" Building 1024| "I'm extremely serious about nailing your MandrakeSoft | step-daughter, but other than that, yes." - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
Are we talking about device arguments just for chrdevs and blkdevs? (ie. drivers) or for regular files too? Speaking about drivers specifically, a controlling miscdev, one per device or one per group of devices depending on your needs, is a much more clean solution for passing ioctl-type data. You are free to come up with whatever method of communication with the driver is most efficient for your needs -- without perverting open(2). Notice also a "metadata miscdev" solves the problem of passing options on open -- just pass those options to the miscdev before you open it... metadata miscdevs are a clean solution to what procfs hacks and ioctls are trying to accomplish. Jeff -- Jeff Garzik | "Do you have to make light of everything?!" Building 1024| "I'm extremely serious about nailing your MandrakeSoft | step-daughter, but other than that, yes." - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
AFAIK the original stated intention of ext3 was cd linux/fs cp -a ext2 ext3 # hack on ext3 That leaves ext2 in ultra-stability, no-patches-unless-absolutely-necessary mode. IMHO prove a new feature, like directories in page cache, journaling, etc. in ext3 first. Then maybe after a year of testing, if people actually care, backport those features to ext2. -- Jeff Garzik | Game called on account of naked chick Building 1024| MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
PATCH 2.4.0.11.1: ramfs fix for highmem
ramfs calls memset(page_address(page),...) on a page which might be in highmem. This had been mentioned before on lkml, I noticed, but it never made it into the kernel. I noticed and changed the same thing when I was hacking on tmpfs, so might as well make sure this gets into the kernel. There is also another patch on lkml for ramfs, one which adds resource limits. Ug, it can be done so much better with mount options. Anyway... I'm straying off topic. -- Jeff Garzik | "When I do this, my computer freezes." Building 1024 | -user MandrakeSoft| "Don't do that." | -level 1 Index: fs/ramfs/inode.c === RCS file: /cvsroot/gkernel/linux_2_4/fs/ramfs/inode.c,v retrieving revision 1.1.1.5 diff -u -r1.1.1.5 inode.c --- fs/ramfs/inode.c2000/10/22 21:52:44 1.1.1.5 +++ fs/ramfs/inode.c2000/11/08 17:28:33 @@ -65,7 +65,8 @@ static int ramfs_readpage(struct file *file, struct page * page) { if (!Page_Uptodate(page)) { - memset(page_address(page), 0, PAGE_CACHE_SIZE); + memset(kmap(page), 0, PAGE_CACHE_SIZE); + kunmap(page); flush_dcache_page(page); SetPageUptodate(page); }
tmpfs update...
Attached is another shot at tmpfs. I use my own vm_ops, where the only member initialized is nopage (==filemap_nopage). In particular, swapout==NULL, so that try_to_swap_out will swap out pages for us. Of course, it's still broken, with pretty much the same behavior as before -- things don't seem to be getting swapped out correctly, so once physical RAM is exhausted, things break. Note for reading -- the code is now pretty much the same as ramfs again, with the exception that we use own our mmap function to hook in the custom vm_ops. Comments appreciated, Jeff -- Jeff Garzik | "When I do this, my computer freezes." Building 1024 | -user MandrakeSoft| "Don't do that." | -level 1 /* * Resizable simple ram filesystem for Linux. * Hacked into tmpfs by Jeff Garzik * * Copyright (C) 2000 Linus Torvalds. * 2000 Transmeta Corp. * * ramfs->tmpfs hacks by Jeff Garzik <[EMAIL PROTECTED]> * * This file is released under the GPL. */ #include #include #include #include #include #include #include #include #include #include /* some random number */ #define TMPFS_MAGIC 0xBEDAC0ED static struct super_operations tmpfs_ops; static struct address_space_operations tmpfs_aops; static struct file_operations tmpfs_dir_operations; static struct file_operations tmpfs_file_operations; static struct inode_operations tmpfs_dir_inode_operations; static int tmpfs_statfs(struct super_block *sb, struct statfs *buf) { buf->f_type = TMPFS_MAGIC; buf->f_bsize = PAGE_CACHE_SIZE; buf->f_namelen = 255; return 0; } /* * Lookup the data. This is trivial - if the dentry didn't already * exist, we know it is negative. */ static struct dentry * tmpfs_lookup(struct inode *dir, struct dentry *dentry) { d_add(dentry, NULL); return NULL; } /* * Read a page. Again trivial. If it didn't already exist * in the page cache, it is zero-filled. */ static int tmpfs_readpage(struct file *file, struct page * page) { if (!PageActive(page)) BUG(); if (!Page_Uptodate(page)) { void *addr = (void*) kmap(page); memset(addr, 0, PAGE_CACHE_SIZE); kunmap(page); flush_dcache_page(page); SetPageUptodate(page); } SetPageDirty(page); UnlockPage(page); return 0; } static int tmpfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to) { void *addr; addr = (void *) kmap(page); if (!Page_Uptodate(page)) { memset(addr, 0, PAGE_CACHE_SIZE); flush_dcache_page(page); SetPageUptodate(page); } SetPageDirty(page); return 0; } static int tmpfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to) { struct inode *inode = (struct inode*)page->mapping->host; loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to; kunmap(page); if (pos > inode->i_size) inode->i_size = pos; return 0; } static struct vm_operations_struct tmpfs_mmap_ops = { nopage: filemap_nopage, }; /* This is used for a general mmap of a disk file */ static int tmpfs_file_mmap(struct file * file, struct vm_area_struct * vma) { struct vm_operations_struct * ops; struct inode *inode = file->f_dentry->d_inode; ops = &tmpfs_mmap_ops; if (!inode->i_sb || !S_ISREG(inode->i_mode)) return -EACCES; if (!inode->i_mapping->a_ops->readpage) return -ENOEXEC; UPDATE_ATIME(inode); vma->vm_ops = ops; return 0; } static struct inode *tmpfs_get_inode(struct super_block *sb, int mode, int dev) { struct inode * inode = get_empty_inode(); if (inode) { inode->i_sb = sb; inode->i_dev = sb->s_dev; inode->i_mode = mode; inode->i_uid = current->fsuid; inode->i_gid = current->fsgid; inode->i_size = 0; inode->i_blksize = PAGE_CACHE_SIZE; inode->i_blocks = 0; inode->i_rdev = to_kdev_t(dev); inode->i_nlink = 1; inode->i_op = NULL; inode->i_fop = NULL; inode->i_mapping->a_ops = &tmpfs_aops; inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; inode->u.generic_ip = NULL; switch (mode & S_IFMT) { default: init_special_inode(inode, mode, dev); break;
PATCH: tmpfs
Here's a quick one-night hack of ramfs to make it swap... ie. tmpfs. If some of the VM gurus could look over it, that would be great. It works great until physical RAM is exhausted, then... infinite swap :) My current approach is to swap out pages "manually" in address_space::writepage, and read them back in when ::readpage is called. A red-black tree of swapped-out pages is maintained for each inode in RAM. metadata is never swapped out, only data. Alternative approach: using a custom vm_operations, set swapout==NULL. This forces try_to_swap_out() to swap the page out. ::writepage becomes very simple then, but ::readpage becomes more complex. Comments welcome... I think something -simple- like this can be used to create tmpfs. I looked at the "shmfs" code, and it was huge compared to this... Jeff, the VM newbie -- Jeff Garzik | "When I do this, my computer freezes." Building 1024 | -user MandrakeSoft| "Don't do that." | -level 1 Index: linux_2_4/fs/Config.in diff -u linux_2_4/fs/Config.in:1.1.1.5 linux_2_4/fs/Config.in:1.1.1.5.18.3 --- linux_2_4/fs/Config.in:1.1.1.5 Sun Oct 22 14:51:49 2000 +++ linux_2_4/fs/Config.in Mon Nov 6 23:39:52 2000 @@ -30,6 +30,7 @@ fi tristate 'Compressed ROM file system support' CONFIG_CRAMFS tristate 'Simple RAM-based file system support' CONFIG_RAMFS +dep_bool 'Simple VM-backed, RAM-based file system support' CONFIG_TMPFS +$CONFIG_EXPERIMENTAL tristate 'ISO 9660 CDROM file system support' CONFIG_ISO9660_FS dep_mbool ' Microsoft Joliet CDROM extensions' CONFIG_JOLIET $CONFIG_ISO9660_FS Index: linux_2_4/fs/Makefile diff -u linux_2_4/fs/Makefile:1.1.1.5 linux_2_4/fs/Makefile:1.1.1.5.18.2 --- linux_2_4/fs/Makefile:1.1.1.5 Sun Oct 22 14:51:44 2000 +++ linux_2_4/fs/Makefile Mon Nov 6 20:25:40 2000 @@ -29,6 +29,7 @@ subdir-$(CONFIG_EXT2_FS) += ext2 subdir-$(CONFIG_CRAMFS)+= cramfs subdir-$(CONFIG_RAMFS) += ramfs +subdir-$(CONFIG_TMPFS) += tmpfs subdir-$(CONFIG_CODA_FS) += coda subdir-$(CONFIG_MINIX_FS) += minix subdir-$(CONFIG_FAT_FS)+= fat Index: linux_2_4/fs/tmpfs/Makefile diff -u /dev/null linux_2_4/fs/tmpfs/Makefile:1.1.2.1 --- /dev/null Tue Nov 7 00:36:29 2000 +++ linux_2_4/fs/tmpfs/Makefile Mon Nov 6 20:25:40 2000 @@ -0,0 +1,11 @@ +# +# Makefile for the linux tmpfs routines. +# + +O_TARGET := tmpfs.o + +O_OBJS := inode.o + +M_OBJS := $(O_TARGET) + +include $(TOPDIR)/Rules.make Index: linux_2_4/fs/tmpfs/inode.c diff -u /dev/null linux_2_4/fs/tmpfs/inode.c:1.1.2.4 --- /dev/null Tue Nov 7 00:36:29 2000 +++ linux_2_4/fs/tmpfs/inode.c Tue Nov 7 00:25:07 2000 @@ -0,0 +1,552 @@ +/* + * Resizable simple ram filesystem for Linux. + * Hacked into tmpfs by Jeff Garzik + * + * Copyright (C) 2000 Linus Torvalds. + * 2000 Transmeta Corp. + * + * ramfs->tmpfs hacks by Jeff Garzik <[EMAIL PROTECTED]> + * + * This file is released under the GPL. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "rbtree.h" +#include "rbtree.c" + +/* some random number */ +#define TMPFS_MAGIC0xBEDAC0ED + +#define tmpfs_ent_g(n) list_entry(n, struct tmpfs_swap_ent, node) +#define tmpfs_for_each_ent(ent) \ + for(ent = tmpfs_ent_g(ti->swap_entries.next); \ + ent != tmpfs_ent_g(&ti->swap_entries); \ + ent = tmpfs_ent_g(ent->node.next)) + +static struct super_operations tmpfs_ops; +static struct address_space_operations tmpfs_aops; +static struct file_operations tmpfs_dir_operations; +static struct file_operations tmpfs_file_operations; +static struct inode_operations tmpfs_dir_inode_operations; +static kmem_cache_t *swap_ent_cache; + + +struct tmpfs_swap_ent { + rb_node_t node; + swp_entry_t ent; + struct page *page; +}; + + +static inline struct tmpfs_swap_ent * +rb_search_page_cache(struct inode *inode, struct page *page) +{ + rb_node_t * n = inode->u.generic_ip; + struct tmpfs_swap_ent *ent; + + while (n) + { + ent = rb_entry(n, struct tmpfs_swap_ent, node); + + if (((unsigned long)page) < ((unsigned long)ent->page)) + n = n->rb_left; + else if (((unsigned long)page) > ((unsigned long)ent->page)) + n = n->rb_right; + else { + rb_erase(n, (rb_root_t*) &inode->u.generic_ip); + return ent; + } + } + return NULL; +} + + +static inline struct tmpfs_swap_ent * +__rb_insert_page_cache(struct inode *inode, struct page *page, rb_node_t *node) +{ + rb_node_t ** p = (rb_node_t **) &inode->u.gen