random infos
Hi all I`ve followed the ext4 development since few weeks ago and I would like to know/understand some things in its development process. :) I noticed that there`re already some stuffs merged to Linus` tree and some testings in mm tree, why not have an ext4 tree? what about the proposed "Ext4 patchsets" wikipage? is there already an official or unofficial ext4 patchset announced? wouldn`t it be easier to follow/track the ext4 devepment? Thanks in advance -- Dorileo - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: [patch] change attribute for ext3
On Wed, Dec 13, 2006 at 06:24:28PM -0700, Andreas Dilger wrote: > On Sep 13, 2006 14:11 -0400, Trond Myklebust wrote: > > I would really have preferred a full-blown 64-bit counter as per > > RFC3530, but I suppose we could always combine this change attribute > > with the high word from ctime in order to make up the NFSv4 change > > attribute. That should keep us safe until someone develops a ramdisk > > with < 1 nsecond access time. > > Trond, can you please elaborate on the need for a 64-bit version counter > for NFSv4? I'm not Trond, but > What kind of requirements does NFSv4 place on the version? Monotonic is > probably a good bet. The only requirement is that it be unique (assuming a file is never modified 2^64 times). Clients can't compare them except for equality. > Does it need to be global for the filesystem Nope. > or is a per-inode version sufficient? Yes. > What functionality of NFSv4 needs the version? Clients use it to revalidate their caches. --b. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rfc: [patch] change attribute for ext3
On Sep 13, 2006 14:11 -0400, Trond Myklebust wrote: > On Wed, 2006-09-13 at 18:42 +0200, Alexandre Ratchov wrote: > > here is a small patch that adds the "change attribute" for ext3 > > file-systems; > > > > the change attribute is a simple counter that is reset to zero on > > inode creation and that is incremented every time the inode data is > > modified (similarly to the "ctime" time-stamp). > > I would really have preferred a full-blown 64-bit counter as per > RFC3530, but I suppose we could always combine this change attribute > with the high word from ctime in order to make up the NFSv4 change > attribute. That should keep us safe until someone develops a ramdisk > with < 1 nsecond access time. Trond, can you please elaborate on the need for a 64-bit version counter for NFSv4? We have been looking at something similar, but ctime+nsec is not really sufficient as it is possible that the inode ctime can go backward if the clock is reset. What kind of requirements does NFSv4 place on the version? Monotonic is probably a good bet. Does it need to be global for the filesystem, or is a per-inode version sufficient? What functionality of NFSv4 needs the version? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ext4 devel interlock meeting minutes (Dec. 13 2006)
On Dec 13, 2006 15:42 -0800, AVANTIKA R. MATHUR wrote: > - Change Attribute: > -- Andreas asked the bull team why a new field in the inode is > necessary rather than using the i_version field, and also mentioned that > all code changes can be in mark_inode_dirty. > -- The ctime cannot be used for the change attribute because if the > server clock is incorrect, the ctime can go backwards in time. > -- semantics needed: nfsv4 and bull team require 32 bits, clusterfs > needs 64 bits. Minor clarification. Bull patch for NFSv4 only uses 32 bits, but apparently the NFSv4 spec calls for 64 bits. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Ext4 devel interlock meeting minutes (Dec. 13 2006)
Ext4 Developer Interlock Call: 12-13-06 Minutes Attendees: Mingming Cao, Dave Kleikamp(Shaggy), Andreas Dilger, Eric Sandeen, Ted Ts'o, Takashi Sato, Badari Pulavarty, Jean-Pierre Dion, Jean Noel Cordenner, Valérie Clément, Avantika Mathur Minutes can be accessed at: http://ext4.wiki.kernel.org/index.php/Ext4_Developer%27s_Conference_Call - The next interlock call will be in January - Delayed Allocation and Multiple Block Allocation: Alex Tomas' patches for delayed improves performance since it batches all writes to one area of the disks, reducing number of seeks. -- Online defragmention depends on delalloc and mballoc, these two should be the priority to be pushed first. -- Delalloc is not currently implemented on data=ordered mode; This is a desired feature, but adding this support should not delay merging the current patch. - Preallocation: Mingming posted comment to the preallocation patches, asking if preallocation should be implemented for block based files so ext3 users could also use it. It was decided that it will be a rare case where a user will want to add preallocation on existing block based files, but might be a nice feature to have. Since zero-ing out the blocks can take time and block applications, the ideal method would be to first convert files to extents. If this was implemented, we would want an in kernel converter utility that can convert from ext3 to have all ext4 features and back. - Backwards compatibility: Eric asked if maintaining ext3 features in ext4 and format compatibility is necessary. Ted believes that up until now it has not been too much work to maintain. With 64bit and extents it may be more work, but also might be useful for some people. In the future, if it is too much work or has a great impact on performance, then backwards compatibility may be eliminated. - Finer Timestamp: Ted will propose to the mailing list that nanosecond timestamps are only supported in large inodes. - Change Attribute: -- Andreas asked the bull team why a new field in the inode is necessary rather than using the i_version field, and also mentioned that all code changes can be in mark_inode_dirty. -- The ctime cannot be used for the change attribute because if the server clock is incorrect, the ctime can go backwards in time. -- semantics needed: nfsv4 and bull team require 32 bits, clusterfs needs 64 bits. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
Andreas Dilger wrote: > On Dec 13, 2006 08:40 -0600, Eric Sandeen wrote: >> This is probably a discrepancy that comes from reading the block device >> directly, right? Which is not necessarily in sync with the address >> space for the filesystem. > > In fact, the block device IS in sync with the filesystem address space, > but only for metadata. The file data each has its own address space and > is not coherent. Ah, thanks for the clarification. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
On Dec 13, 2006 08:40 -0600, Eric Sandeen wrote: > This is probably a discrepancy that comes from reading the block device > directly, right? Which is not necessarily in sync with the address > space for the filesystem. In fact, the block device IS in sync with the filesystem address space, but only for metadata. The file data each has its own address space and is not coherent. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
On Dec 13, 2006 16:02 +0100, Valerie Clement wrote: > Alex Tomas wrote: > >ah, it's simple. you're looking at superblock counters. > >we don't update them every time. this was changed years > >ago. try the same with regular ext3 > > Yes, sorry, I didn't know that and I didn't notice that before. > And the per-group counters are right. It would be good, IMHO, to update the superblock counters periodically anyways (e.g. statfs already has done the calculation). Otherwise, e2fsck on a mounted-but-idle system always complains about them being incorrect. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [patch 2/3] change attribute for ext4: ext4 specific code
Sorry for the late answer. I agree that using i_version field sounds better, I'm working on it. regards, Jean noel Andreas Dilger wrote: On Nov 29, 2006 19:54 +0100, Jean-Noel Cordenner wrote: This part of the patch concerns the ext4 code. I was looking more closely at this code, and wondering two things: - why not just use the existing inode->i_version field instead of adding a new i_change_attribute? The i_version is not used by the VFS at all, and only for detecting directory modifications in ext3 (where it has the same semantic as the new i_change_attribute anyways). This avoids bloating the VFS inode more than it already is. - why not just do an increment of i_version in ext3_do_update_inode()? That is ext3_dirty_inode->ext3_mark_inode_dirty->ext3_mark_iloc_dirty() and also handles all of the VFS locations that call notify_change(). This MUST be called anywhere that we make a persistent change to the inode in order to flush it to disk. That would reduce the patch to a few lines at most. I don't think there are any places we need to supplement this (even mmap IO or writes to a hole will update mtime). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][Patch 1/1] Persistent preallocation in ext4
Dave Kleikamp wrote: If anything, the block-based preallocation could initialize all of the data to zero. It would be slow, but it would still provide the correct function and result in contiguous allocation. Agree. That seems goood enough. Mingming - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][Patch 1/1] Persistent preallocation in ext4
On Wed, Dec 13, 2006 at 07:36:29AM -0600, Dave Kleikamp wrote: > On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote: > > On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote: > > > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote: > > > > Supporting preallocation for extent based files seems fairly > > > straightforward. I agree we should look at this first. After get this > > > done, it probably worth re-consider whether to support preallocation for > > > non-extent based files on ext4. I could imagine user upgrade from ext3 > > > to ext4, and expecting to use preallocation on those existing files > > I disagree here. Why add the complexity for what is going to be a rare > case? In cases where a user is going to benefit from preallocation, > she'll probably also benefit from extents, and would be better off > making a copy of the file, thus converting it to extents. > > > I gave a thought on this initially. But, I was not sure how we should > > implement preallocation in a non-extent based file. Using extents we can > > mark a set of blocks as unitialized, but how will we do this for > > non-extent based files ? If we do not have a way to mark blocks > > uninitialized, when someone will try to read from a preallocated block, > > it will return junk/stale data instead of zeroes. > > If anything, the block-based preallocation could initialize all of the > data to zero. It would be slow, but it would still provide the correct > function and result in contiguous allocation. And posix_fallocate does that already ... Regards Suparna > > Shaggy > -- > David Kleikamp > IBM Linux Technology Center > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
Alex Tomas wrote: ah, it's simple. you're looking at superblock counters. we don't update them every time. this was changed years ago. try the same with regular ext3 Yes, sorry, I didn't know that and I didn't notice that before. And the per-group counters are right. Thanks a lot for your clarification. Valérie - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
> Valerie Clement (VC) writes: VC> Alex Tomas wrote: >>> Valerie Clement (VC) writes: >> VC> Hi Alex, VC> I did the following test: VC> # mkfs /dev/sdc1 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test VC> # dumpe2fs -h /dev/sdc1 VC> # cp linux.tar /test VC> # tar xf /test/linux.tar VC> # dumpe2fs -h /dev/sdc1 >> VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output VC> did not change after running the cp and tar commands. >> >> sync should help, I think. VC> No, I've just tried, it does not change anything. ah, it's simple. you're looking at superblock counters. we don't update them every time. this was changed years ago. try the same with regular ext3 thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
> Valerie Clement (VC) writes: VC> Alex Tomas wrote: >>> Valerie Clement (VC) writes: >> VC> Hi Alex, VC> I did the following test: VC> # mkfs /dev/sdc1 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test VC> # dumpe2fs -h /dev/sdc1 VC> # cp linux.tar /test VC> # tar xf /test/linux.tar VC> # dumpe2fs -h /dev/sdc1 >> VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output VC> did not change after running the cp and tar commands. >> >> sync should help, I think. VC> No, I've just tried, it does not change anything. VC> Valérie not sure [EMAIL PROTECTED] ~]# mke2fs -m0 -j /dev/sda1 >&/dev/null [EMAIL PROTECTED] ~]# mount -t ext4dev -o extents,mballoc,delalloc /dev/sda1 /test [EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/before dumpe2fs 1.39.cfs1 (29-May-2006) [EMAIL PROTECTED] ~]# dd if=/dev/zero of=/test/f bs=1M count=20 20+0 records in 20+0 records out [EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/after1 dumpe2fs 1.39.cfs1 (29-May-2006) [EMAIL PROTECTED] ~]# sync [EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/after2 dumpe2fs 1.39.cfs1 (29-May-2006) [EMAIL PROTECTED] ~]# diff -pu /tmp/before /tmp/after1 --- /tmp/before 2006-12-13 14:39:42.541828787 +0300 +++ /tmp/after1 2006-12-13 14:39:59.602086138 +0300 @@ -45,9 +45,9 @@ Group 0: (Blocks 0-32767) Reserved GDT blocks at 2-245 Block bitmap at 246 (+246), Inode bitmap at 247 (+247) Inode table at 248-752 (+248) - 15607 free blocks, 16149 free inodes, 2 directories + 15607 free blocks, 16148 free inodes, 2 directories Free blocks: 17161-32767 - Free inodes: 12-16160 + Free inodes: 13-16160 Group 1: (Blocks 32768-65535) Backup superblock at 32768, Group descriptors at 32769-32769 Reserved GDT blocks at 32770-33013 [EMAIL PROTECTED] ~]# diff -pu /tmp/after1 /tmp/after2 --- /tmp/after1 2006-12-13 14:39:59.602086138 +0300 +++ /tmp/after2 2006-12-13 14:40:05.592176495 +0300 @@ -45,8 +45,8 @@ Group 0: (Blocks 0-32767) Reserved GDT blocks at 2-245 Block bitmap at 246 (+246), Inode bitmap at 247 (+247) Inode table at 248-752 (+248) - 15607 free blocks, 16148 free inodes, 2 directories - Free blocks: 17161-32767 + 10487 free blocks, 16148 free inodes, 2 directories + Free blocks: 17161-22527, 27648-32767 Free inodes: 13-16160 Group 1: (Blocks 32768-65535) Backup superblock at 32768, Group descriptors at 32769-32769 [EMAIL PROTECTED] ~]# try to repeat with a single file? thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
Alex Tomas wrote: Valerie Clement (VC) writes: VC> Hi Alex, VC> I did the following test: VC> # mkfs /dev/sdc1 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test VC> # dumpe2fs -h /dev/sdc1 VC> # cp linux.tar /test VC> # tar xf /test/linux.tar VC> # dumpe2fs -h /dev/sdc1 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output VC> did not change after running the cp and tar commands. sync should help, I think. This is probably a discrepancy that comes from reading the block device directly, right? Which is not necessarily in sync with the address space for the filesystem. -Eric - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
Alex Tomas wrote: +/* + * entry function for inode syncing + * it's responsbility is to sort all inode out in their locality groups + */ +void ext4_lg_sync_inodes(struct super_block *sb, struct writeback_control *wbc) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_locality_group *lg; + + /* refill pending groups from s_dirty */ + spin_lock(&inode_lock); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + struct ext4_inode_info *ei = EXT4_I(inode); + + lg = ei->i_locality_group; + if (lg == NULL) { + if (S_ISDIR(inode->i_mode) || i_size_read(inode) == 0) { + if (atomic_read(&inode->i_count)) { + /* +* The inode is clean, inuse +*/ + list_move(&inode->i_list, &inode_in_use); + } else { + /* +* The inode is clean, unused +*/ + list_move(&inode->i_list, &inode_unused); + } + continue; + } + /* XXX: atime changed ? */ + list_move(&inode->i_list, &inode_in_use); + continue; + } + + /* move inode in proper locality group's dirty list */ + spin_lock(&lg->lg_lock); + list_move_tail(&inode->i_list, &lg->lg_dirty); + spin_unlock(&lg->lg_lock); + + if (!test_and_set_bit(EXT4_LG_DIRTY, &lg->lg_flags)) + list_move(&lg->lg_list, &sbi->s_locality_dirty); + } + spin_unlock(&inode_lock); + + ext4_lg_sync_groups(sb, wbc); +} Hi Alex, I did the following test: # mkfs /dev/sdc1 # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test # dumpe2fs -h /dev/sdc1 # cp linux.tar /test # tar xf /test/linux.tar # dumpe2fs -h /dev/sdc1 The "Free blocks" and "Free inodes" counters in the dumpe2fs output did not change after running the cp and tar commands. I think it misses a call to ext4_commit_super() at the end of the ext4_lg_sync_inodes function. Am I right ? Valérie - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
Alex Tomas wrote: Valerie Clement (VC) writes: VC> Hi Alex, VC> I did the following test: VC> # mkfs /dev/sdc1 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test VC> # dumpe2fs -h /dev/sdc1 VC> # cp linux.tar /test VC> # tar xf /test/linux.tar VC> # dumpe2fs -h /dev/sdc1 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output VC> did not change after running the cp and tar commands. sync should help, I think. No, I've just tried, it does not change anything. Valérie VC> I think it misses a call to ext4_commit_super() at the end of the VC> ext4_lg_sync_inodes function. VC> Am I right ? what for? allocation was delayed, no blocks were allocated. thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ext4-locality-groups patch
> Valerie Clement (VC) writes: VC> Hi Alex, VC> I did the following test: VC> # mkfs /dev/sdc1 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test VC> # dumpe2fs -h /dev/sdc1 VC> # cp linux.tar /test VC> # tar xf /test/linux.tar VC> # dumpe2fs -h /dev/sdc1 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output VC> did not change after running the cp and tar commands. sync should help, I think. VC> I think it misses a call to ext4_commit_super() at the end of the VC> ext4_lg_sync_inodes function. VC> Am I right ? what for? allocation was delayed, no blocks were allocated. thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][Patch 1/1] Persistent preallocation in ext4
On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote: > On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote: > > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote: > > Supporting preallocation for extent based files seems fairly > > straightforward. I agree we should look at this first. After get this > > done, it probably worth re-consider whether to support preallocation for > > non-extent based files on ext4. I could imagine user upgrade from ext3 > > to ext4, and expecting to use preallocation on those existing files I disagree here. Why add the complexity for what is going to be a rare case? In cases where a user is going to benefit from preallocation, she'll probably also benefit from extents, and would be better off making a copy of the file, thus converting it to extents. > I gave a thought on this initially. But, I was not sure how we should > implement preallocation in a non-extent based file. Using extents we can > mark a set of blocks as unitialized, but how will we do this for > non-extent based files ? If we do not have a way to mark blocks > uninitialized, when someone will try to read from a preallocated block, > it will return junk/stale data instead of zeroes. If anything, the block-based preallocation could initialize all of the data to zero. It would be slow, but it would still provide the correct function and result in contiguous allocation. Shaggy -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][Patch 1/1] Persistent preallocation in ext4
On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote: > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote: > > + > > + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) > > + return -ENOTTY; > > > > Supporting preallocation for extent based files seems fairly > straightforward. I agree we should look at this first. After get this > done, it probably worth re-consider whether to support preallocation for > non-extent based files on ext4. I could imagine user upgrade from ext3 > to ext4, and expecting to use preallocation on those existing files I gave a thought on this initially. But, I was not sure how we should implement preallocation in a non-extent based file. Using extents we can mark a set of blocks as unitialized, but how will we do this for non-extent based files ? If we do not have a way to mark blocks uninitialized, when someone will try to read from a preallocated block, it will return junk/stale data instead of zeroes. But, if we can think of a solution here then it will be as simple as removing the check above and replacing "ext4_ext_get_blocks()" with "ext4_get_blocks_wrap()" in the while() loop. > > > + > > + block = EXT4_BLOCK_ALIGN(input.offset, blkbits) >> blkbits; > > + max_blocks = EXT4_BLOCK_ALIGN(input.len, blkbits) >> blkbits; I was wondering if I should change above lines to this : + block = input.offset >> blkbits; + max_blocks = (EXT4_BLOCK_ALIGN(input.len+input.offset, blkbits) >> blkbits) - block; Reason is that the block which contains the offset, should also be preallocated. And the max_blocks should be calculated accordingly. > > + while(ret>=0 && ret > + { > > + block = block + ret; > > + max_blocks = max_blocks - ret; > > + ret = ext4_ext_get_blocks(handle, inode, block, > > + max_blocks, &map_bh, > > + EXT4_CREATE_UNINITIALIZED_EXT, 1); > > Since the interface takes offset and number of blocks to allocate, I > assuming we are going to handle holes in preallocation, thus, we cannot > not mark the extend_size flag to 1 when calling ext4_ext_get_blocks. > > I think we should update i_size and i_disksize after preallocation. Oh, > to protect parallel updating i_size, we have to take i_mutex down. Okay. So, is this what you want to be done here : +retry: +ret = 0; +while(ret>=0 && ret 0 && test_bit(BH_New, &map_bh.b_state)) +nblocks = nblocks + ret; +} +if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, +&retries)) +goto retry; + +if(nblocks) { +mutex_lock(&inode->i_mutex); +inode->i_size = inode->i_size + (nblocks >> blkbits); +EXT4_I(inode)->i_disksize = inode->i_size; +mutex_unlock(&inode->i_mutex); +} > > > + } > > + ext4_mark_inode_dirty(handle, inode); > > + ext4_journal_stop(handle); > > + > > Error code returned by ext4_journal_stop() is being ignored here, is > this right? > Well, there are other places in ext34/ioctl.c which ignore the return > returned by ext4_journal_stop(), maybe should fix this in a separate > patch. Agreed. I think following should take care of it: + ext4_mark_inode_dirty(handle, inode); + ret2 = ext4_journal_stop(handle); + if(ret > 0) + ret = ret2; + return ret > 0 ? nblocks : ret; > > + return ret>0?0:ret; > > + } > > > > > Oh, what if we failed to allocate the full amount of blocks? i.e, the > ext4_ext_get_blocks() returns -ENOSPC error and exit the loop early. Are > we going to return error, or try something like > > if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries)) > goto retry > > I wonder it might be useful to return the actual number of blocks > preallocated back to the application. Ok. Yes, makes sense. We can return the number of "new" blocks like this: + return ret > 0 ? nblocks : ret; Please let me know if you agree with the above set of changes, and any further comments you have. I will then update and test the new patch and post it again. Thanks! Regards, Amit Arora - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html