quota tools (was Re: ext3 for 2.4)
hi, On May 17, 9:20pm, Andrew Morton wrote: > Subject: ext3 for 2.4 > ... > - quotas appear to work OK. I'll leave them turned on > as I test things, and watch out for oddities. > > It's hard to find working quota tools. Most of them > either don't want to compile and/or don't understand > ext3. Jan Kara is maintaining a set of quota tools > at http://www.sourceforge.net/projects/linuxquota/ which > work well. Yes, that's your best bet for working quota tools - they are being maintained by both Jan and Marco. > The current CVS tree from there seems to be > under XFS development at present and needs a couple of That's not quite correct - the XFS development here was done awhile ago now, and has been bug fixes only for some time. > patches to work against ext3 (and even ext2). I can send them > to whoever needs. Send them to Jan and/or Marco (both cc'd) and they'll very quickly show up in cvs. cheers. -- Nathan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: [Ext2-devel] Re: ext3 for 2.4
On Thu, May 17, 2001 at 03:00:28PM -0400, Jeff Garzik wrote: > AFAIK the original stated intention of ext3 was > > cd linux/fs > cp -a ext2 ext3 > # hack on ext3 > > That leaves ext2 in ultra-stability, > no-patches-unless-absolutely-necessary mode. > > IMHO prove a new feature, like directories in page cache, journaling, > etc. in ext3 first. Then maybe after a year of testing, if people > actually care, backport those features to ext2. Alternatively, once we get ext3 with just journaling stable (and with an option to not do journaling at all), simply do something like this: cd linux/fs rm -f ext2 mv ext3 ext2 cp -r ext2 ext3 # hack hack hack on ext3 and add even more features So ext3 is always the "development version", and "ext2 is the stable version". - Ted - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
AFAIK the original stated intention of ext3 was cd linux/fs cp -a ext2 ext3 # hack on ext3 That leaves ext2 in ultra-stability, no-patches-unless-absolutely-necessary mode. IMHO prove a new feature, like directories in page cache, journaling, etc. in ext3 first. Then maybe after a year of testing, if people actually care, backport those features to ext2. -- Jeff Garzik | Game called on account of naked chick Building 1024| MandrakeSoft | - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
On Thursday 17 May 2001 17:53, Andrew Morton wrote: > It's probably worth thinking about adding a fourth journalling > mode: `journal=none'. Roll it all up into a single codebase > and call it ext4. Or ext5 (= ext2 + ext3). > It rather depends on where the buffercache ends up. ext3 is > a client of JBD (nee JFS). JBD does *block* level journalling. > Any major change at that level will take rather some adjusting > to. Well, if you look how I did the index, it works with blocks and buffers while still staying entirely in the page cache. This was Stephen's suggestion, and it integrates reliably with Al's page-oriented code. So I'm mixing pages and blocks together and it's working pretty well. BTW, the parts of Al's patch that I converted from pages to blocks got shorter and easier to read. I'm now working on some code to handle non-data blocks in a similar way, so if this works out it could make the conversion an awful lot less painful for you. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
Andrew writes: > It's probably worth thinking about adding a fourth journalling > mode: `journal=none'. Yes, I had added this (at least in skeleton form) in my ext3 tree. If only I could keep up with you and Daniel for both the ext3 and indexed directory stuff, I might be able to submit it... Basically, I wrapped all of the ext3 journal operations in an inline ext3_ and did nothing if handle was NULL (except for the journal_dirty_metadata, which went to mark_buffer_dirty(). You could also just make them (mostly) no-ops if CONFIG_EXT3_FS was not defined and we were living in the ext2 tree. We may want to keep the orphan list handling even for ext2, because e2fsck does orphan cleanup regardless of whether the filesystem has a journal, and it would the amount of output from e2fsck. Doesn't matter much either way. > Roll it all up into a single codebase and call it ext4. Rather just stick with ext3 for now. Don't want to confuse the issue even more. If we can get the ext3 code mounting ext2 filesystems (i.e. without a journal), then we can slowly merge the changes back to stock ext2 surrounded by CONFIG_EXT3_FS (or not, as Linus dictates). > It rather depends on where the buffercache ends up. ext3 is a > client of JBD (nee JFS). JBD does *block* level journalling. Any > major change at that level will take rather some adjusting to. Well, Daniel's part of the code still uses buffer_heads, but they are backed by the page cache and not the buffer cache. This is the direction Linus wants to go (AFAICS), that we use the page cache for cacheing, and buffer_heads for I/O handles only. Al's page-cache directory stuff does not use Daniel's buffer_head abstraction, so it may need to get changed a bit to work with JBD. At this point, it is probably worth doing a global search-and-replace for all of the jfs_* functions, and rename them jbd_*, to avoid conflicts with IBM JFS. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
Daniel Phillips wrote: > > And the third is a combination of two patches: > > ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz > http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6 > These changes have a very low impact on the journalling code, and vice versa. A few days effort to merge them once ext3/2.4 is steady. And once the pcache stuff is steady: the journalling code is pretty complex. It's probably worth thinking about adding a fourth journalling mode: `journal=none'. Roll it all up into a single codebase and call it ext4. It rather depends on where the buffercache ends up. ext3 is a client of JBD (nee JFS). JBD does *block* level journalling. Any major change at that level will take rather some adjusting to. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
Andrew Morton wrote: > > The tree is based on the porting work which Peter Braam did. It's > in cvs in Jeff Garzik's home on sourceforge. Info on CVS is at > http://sourceforge.net/cvs/?group_id=3242 - the module name > is `ext3'. That was a bit cryptic. cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/gkernel login cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/gkernel co ext3 Also, there's a silly bug which crashes things if the patch is applied, but you select CONFIG_EXT3_FS=n. Don't do that - just back everything out. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
Re: ext3 for 2.4
On Thursday 17 May 2001 13:20, Andrew Morton wrote: > Summary: ext3 works, page_launder() doesn't :) > > The tree is based on the porting work which Peter Braam did. It's > in cvs in Jeff Garzik's home on sourceforge. Info on CVS is at > http://sourceforge.net/cvs/?group_id=3242 - the module name > is `ext3'. There's a README there which describes how to > apply the patchset. Congratulations to all. Naturally, Ext3 will need a fast directory index, and quickly too, before people start running benchmarks against ReiserFS and XFS. :-) Could you take a look at my indexing patch and see what the journalling issues are? (If any) I have three flavors for you to choose from: 1) Good old buffer cache 2) Page cache, block oriented 3) Page cache, blocks and pages The first two are from the same patch, with a compilation option: http://nl.linux.org/~phillips/htree/dx.testme-2.4.4 And the third is a combination of two patches: ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-6 Please take a look and see which style fits best. The pcache patch is the forward-looking one, it's prefered. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED]
ext3 for 2.4
Summary: ext3 works, page_launder() doesn't :) The tree is based on the porting work which Peter Braam did. It's in cvs in Jeff Garzik's home on sourceforge. Info on CVS is at http://sourceforge.net/cvs/?group_id=3242 - the module name is `ext3'. There's a README there which describes how to apply the patchset. Current status is: quite solid. Stress testing on x86/SMP passes and performance in ordered data and writeback data mode is good. Journalled data performance is, of course, so-so. The only big issue of which I am aware is a VM livelock on SMP, discussed below. The patch is against 2.4.4-ac9. Today's changes: - quotas appear to work OK. I'll leave them turned on as I test things, and watch out for oddities. It's hard to find working quota tools. Most of them either don't want to compile and/or don't understand ext3. Jan Kara is maintaining a set of quota tools at http://www.sourceforge.net/projects/linuxquota/ which work well. The current CVS tree from there seems to be under XFS development at present and needs a couple of patches to work against ext3 (and even ext2). I can send them to whoever needs. - Recovery works fine now. The bug was that I was splicing new blocks into a file in ext3_splice_branch() *before* doing a journal_get_write_access() on its parent's buffer. Duh. - Four debugging fields have been removed from buffer_head. b_alloc_transaction, etc. These were debug fields which I couldn't find a use for in 2.4. In 2.2, these were set in ext3_new_block() when we do a getblk() on the new block. In 2.4, we don't do the getblk() any more... - Some tightening of the way commit feeds buffers into the request queues. At present, 256 buffers are fed into ll_rw_block() before we run tq_disk. I *was* pushing thousands down. It doesn't seem to make much difference. Overall throughput with some benchmarks in ordered data mode has been significantly improved by this change. ext3 in general seems faster in 2.4 than in 2.2, presumably because of better request merging. Much more work needs to go into benchmarking and performance tuning. - There's an issue with page_launder(): ext3_file_write() -> generic_file_write() -> __alloc_pages() -> page_launder() -> ext3_writepage() This is bad. It will cause ext3 to be reentered while it has a transaction open against a different fs. This will corrupt filesystems and can deadlock. Making ext3_file_write() set PF_MEMALLOC wasn't suitable. It easily causes 0-order allocation failures within generic_file_write(). The current approach to this is, in ext3_writepage(), to detect when ext3 is being reentered and to simply *return* without writing the page at all. This is kludgy but should work - the only place where the fs can be reentered via writepage() is from page_launder(), and page_launder() doesn't wait on the page. Quotas don't use writepage(), and reentry there is OK. If Marcelo's `priority' argument to writepage() goes in, this can be used in a more sensible manner. Note that this return-if-reentered code is not related to the VM livelock. It has a big printk in it at present.. - Some new test tools: To simulate crashes I have added a new mount option: mount /dev/foo /mnt/bar -t ext3 -o ro-after=NNN When the fs is mounted this way a timer will fire after NNN jiffies and will turn the underlying device immutable. It does this by setting a flag which is tested in submit_bh(). For WRITE requests submit_bh() will simply call bh_end_io(uptodate=1) and return. There's a new ext3 ioctl() which will block the caller until the device has gone readonly. I semi-randomly chose #define EXT3_WAIT_FOR_READONLY _IOR('w', 1, long) The intent here is that a controlling script will: 1: Mount the fs with ro-after=1000 (Ten seconds) 2: Start a test script (eg: dbench) 3: Block on the wait-for-readonly ioctl 4: wake up when the disk has "crashed" 5: Kill off the test script 6: Unmount the fs 7: Mount the fs (let recovery run) 8: unmount the fs 9: run e2fsck to check that the fs is sane 10: modify the ro-after parameter 11: do it all again Scripts which do all this are in the testing/ and tools/ directories. I've been happily simulating crashes in the middle of `dbench 12' runs for an hour now. All is well. I think this covers everything except for verifying that the data content of the files are sane. That can be handled with test tools. Special code will probably be needed to simulate crashes during truncate - with this shotgun approach the fs tends to go immutable before *any* of the truncate has committed, and it's as if nothing ever happened. The `ro-after' code and submit_bh() changes are conditional on CONFIG_JBD_DEBUG.