Testing ext4 persistent preallocation patches for 64 bit features
I plan to test the persistent preallocation patches on a huge sparse device, to know if >32 bit physical block numbers (upto 48bit) behave as expected. I have following questions for this and will appreciate suggestions here: a) What should be the sparse device size which I should use for testing? Should a size of > 8TB (say, 100 TB) be enough ? The physical device (backing store device) size I can have is upto 70GB. b) How do I test allocation of >32 bit physical block numbers ? I can not fill > 8TB, since the physical storage available with me is just 70GB. c) Do I need to put some hack in the filesystem code for above (to allocate >32 bit physical block numbers) ? Any further ideas on how to test this will help. Thanks! -- Regards, Amit Arora - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/3] Move the file data to the new blocks
Andrew Morton wrote: On Tue, 16 Jan 2007 21:05:20 +0900 [EMAIL PROTECTED] wrote: ... +ext4_ext_replace_branches(struct inode *org_inode, struct inode *dest_inode, + pgoff_t from_page, pgoff_t dest_from_page, + pgoff_t count_page, unsigned long *delete_start) +{ + struct ext4_ext_path *org_path = NULL; + struct ext4_ext_path *dest_path = NULL; + struct ext4_extent *oext, *dext; + struct ext4_extent tmp_ext; + int err = 0; + int depth; + unsigned long from, count, dest_off, diff, replaced_count = 0; These should be sector_t, shouldn't they? At some point should we start using blkcnt_t properly? (block-in[-large]-file, not block-in[-large]-device?) I think that's what it was introduced for, although it's not in wide use at this point. I guess the type really isn't used anywhere else; just in the inode's i_blocks. Hmm. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/3] Move the file data to the new blocks
On Mon, 5 Feb 2007 14:12:04 +0100 Jan Kara <[EMAIL PROTECTED]> wrote: > > Move the blocks on the temporary inode to the original inode > > by a page. > > 1. Read the file data from the old blocks to the page > > 2. Move the block on the temporary inode to the original inode > > 3. Write the file data on the page into the new blocks > I have one thing - it's probably not good to use page cache for > defragmentation. Then it is no longer online defragmentation. The issues with maintaining correctness and coherency with ongoing VFS activity would be truly ghastly. If we're worried about pagecache pollution then it would be better to control that from userspace via fadvise(). - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/3] Move the file data to the new blocks
On Tue, 16 Jan 2007 21:05:20 +0900 [EMAIL PROTECTED] wrote: > Move the blocks on the temporary inode to the original inode > by a page. > 1. Read the file data from the old blocks to the page > 2. Move the block on the temporary inode to the original inode > 3. Write the file data on the page into the new blocks > > ... > > + > +/** > + * ext4_ext_replace_branches - replace original extents with new extents. > + * @org_inodeOriginal inode > + * @dest_inode temporary inode > + * @from_pagePage offset > + * @count_page Page count to be replaced > + * @delete_start block offset for deletion > + * > + * This function returns 0 if succeed, otherwise returns error value. > + * Replace extents for blocks from "from" to "from+count-1". > + */ > +static int > +ext4_ext_replace_branches(struct inode *org_inode, struct inode *dest_inode, > + pgoff_t from_page, pgoff_t dest_from_page, > + pgoff_t count_page, unsigned long *delete_start) > +{ > + struct ext4_ext_path *org_path = NULL; > + struct ext4_ext_path *dest_path = NULL; > + struct ext4_extent *oext, *dext; > + struct ext4_extent tmp_ext; > + int err = 0; > + int depth; > + unsigned long from, count, dest_off, diff, replaced_count = 0; These should be sector_t, shouldn't they? > + handle_t *handle = NULL; > + unsigned jnum; > + > + from = from_page << (PAGE_CACHE_SHIFT - dest_inode->i_blkbits); In which case one needs to be very careful to avoid overflows in expressions such as this one. > + wait_on_page_locked(page); > + lock_page(page); The wait_on_page_locked() is unneeded here. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
+ ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems.patch added to -mm tree
The patch titled ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems has been added to the -mm tree. Its filename is ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems From: Markus Rechberger <[EMAIL PROTECTED]> Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079 signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit 1001101001001001 .. -2,082,844,739 1001101001001001 .. 2,212,122,557 <- this currently gets stored on the disk but when converting it to a 64bit signed long value it loses its sign and becomes positive. Cc: Andreas Dilger <[EMAIL PROTECTED]> Cc: Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- fs/ext2/inode.c |6 +++--- fs/ext3/inode.c |6 +++--- fs/ext4/inode.c |6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) diff -puN fs/ext2/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems fs/ext2/inode.c --- a/fs/ext2/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems +++ a/fs/ext2/inode.c @@ -1077,9 +1077,9 @@ void ext2_read_inode (struct inode * ino } inode->i_nlink = le16_to_cpu(raw_inode->i_links_count); inode->i_size = le32_to_cpu(raw_inode->i_size); - inode->i_atime.tv_sec = le32_to_cpu(raw_inode->i_atime); - inode->i_ctime.tv_sec = le32_to_cpu(raw_inode->i_ctime); - inode->i_mtime.tv_sec = le32_to_cpu(raw_inode->i_mtime); + inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime); + inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime); + inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime); inode->i_atime.tv_nsec = inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 0; ei->i_dtime = le32_to_cpu(raw_inode->i_dtime); /* We now have enough fields to check if the inode was active or not. diff -puN fs/ext3/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems fs/ext3/inode.c --- a/fs/ext3/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems +++ a/fs/ext3/inode.c @@ -2670,9 +2670,9 @@ void ext3_read_inode(struct inode * inod } inode->i_nlink = le16_to_cpu(raw_inode->i_links_count); inode->i_size = le32_to_cpu(raw_inode->i_size); - inode->i_atime.tv_sec = le32_to_cpu(raw_inode->i_atime); - inode->i_ctime.tv_sec = le32_to_cpu(raw_inode->i_ctime); - inode->i_mtime.tv_sec = le32_to_cpu(raw_inode->i_mtime); + inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime); + inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime); + inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime); inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0; ei->i_state = 0; diff -puN fs/ext4/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems fs/ext4/inode.c --- a/fs/ext4/inode.c~ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems +++ a/fs/ext4/inode.c @@ -2673,9 +2673,9 @@ void ext4_read_inode(struct inode * inod } inode->i_nlink = le16_to_cpu(raw_inode->i_links_count); inode->i_size = le32_to_cpu(raw_inode->i_size); - inode->i_atime.tv_sec = le32_to_cpu(raw_inode->i_atime); - inode->i_ctime.tv_sec = le32_to_cpu(raw_inode->i_ctime); - inode->i_mtime.tv_sec = le32_to_cpu(raw_inode->i_mtime); + inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime); + inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime); + inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime); inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0; ei->i_state = 0; _ Patches currently in -mm which might be from [EMAIL PROTECTED] are ext2-3-4-fix-file-date-underflow-on-ext2-3-filesystems-on-64-bit-systems.patch - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1][RFC] mm: prepare_write positive return value
On Tue, 06 Feb 2007 11:33:46 +0300 Dmitriy Monakhov <[EMAIL PROTECTED]> wrote: > Almost all read/write operation handles data with chunks(segments or pages) > and result has integral behaviour for folowing scenario: > for_each_chunk() { > res = op(); > if(IS_ERROR(res)) >return progress ? progress : res; > progress += res; > } > prepare_write may has integral behaviour in case of blksize < pgsize, > for example we successfully allocated/read some blocks, but not all of them, > and than some error happend. Currently we eliminate this progress by doing > vmtrunate() after prepare_has failed. > It is good to have ability to signal about this progress. Interprete positive > prepare_write() ret code as bytes num that fs ready to handle at this moment. > I've ask SAW, he think it is sane. This size always less than PAGE_CACHE_SIZE > so it less than AOP_TRUNCATED_PAGE too. > > BTH: This approach was used in OpenVZ 2.6.9 kernel in order to make FS with > delayed allocation more correct, and its works well. > I think not everybody will happy about this, but let's discuss all advantages > and disadvantages of this change. That seems to be a logical change. We'd need to review all the callers and callees to make sure that they handle this change correctly. Your changes deviate quite a lot from standard kernel coding style. Please fix that. Please cc linux-ext4@vger.kernel.org on the next version of these patches. I'm seriously running out of bandwidth over here and ext4 has other maintainers. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 1/1] Nanosecond timestamps
On Fri, Feb 02, 2007 at 08:19:50PM +0530, Kalpak Shah wrote: > Index: linux-2.6.19/fs/ext3/super.c > === > --- linux-2.6.19.orig/fs/ext3/super.c > +++ linux-2.6.19/fs/ext3/super.c > @@ -1770,6 +1772,32 @@ static int ext3_fill_super (struct super > } > > ext3_setup_super (sb, es, sb->s_flags & MS_RDONLY); > + > + /* determine the minimum size of new large inodes, if present */ > + if (sbi->s_inode_size > EXT3_GOOD_OLD_INODE_SIZE) { > + EXT3_SB(sb)->s_want_extra_isize = sizeof(struct ext3_inode) - > EXT3_GOOD_OLD_INODE_SIZE; Maybe EXT3_SB(sb)-> could be replaced by sbi-> here and in the lines below. > + if (EXT3_HAS_RO_COMPAT_FEATURE(sb, > + EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE)) { > + if (EXT3_SB(sb)->s_want_extra_isize < > + le32_to_cpu(es->s_want_extra_isize)) ^^ > + EXT3_SB(sb)->s_want_extra_isize = > + le32_to_cpu(es->s_want_extra_isize); ^^ > + if (EXT3_SB(sb)->s_want_extra_isize < > + le32_to_cpu(es->s_min_extra_isize)) ^^ > + EXT3_SB(sb)->s_want_extra_isize = > + le32_to_cpu(es->s_min_extra_isize); ^^ Since es->s_{min,want}_extra_isize are both __u16 (BTW, shouldn't it be __le16?), I think you should use le16_to_cpu() instead of le32_to_cpu(). > + } > + } > + /* Check if enough inode space is available */ > + if (EXT3_GOOD_OLD_INODE_SIZE + EXT3_SB(sb)->s_want_extra_isize > > + sbi->s_inode_size) { > + EXT3_SB(sb)->s_want_extra_isize = sizeof(struct ext3_inode) - > EXT3_GOOD_OLD_INODE_SIZE; > + printk(KERN_INFO "EXT3-fs: required extra inode space not" > + "available.\n"); > + } If the inode size is EXT3_GOOD_OLD_INODE_SIZE, sbi->s_want_extra_isize won't be initialized. However, it should not be an issue because the ext3_sb_info is set to zero in ext3_fill_super(). Johann - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html