random infos

2006-12-13 Thread dorileo
Hi all

I`ve followed the ext4 development since few weeks ago and I would like to 
know/understand some things in its development process. :)

I noticed that there`re already some stuffs merged to Linus` tree and some 
testings in mm tree, why not have an ext4 tree? what about the proposed "Ext4 
patchsets" wikipage? is there already an official or unofficial ext4 patchset 
announced? wouldn`t it be easier to follow/track the ext4 devepment?

Thanks in advance

--
Dorileo

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: [patch] change attribute for ext3

2006-12-13 Thread J. Bruce Fields
On Wed, Dec 13, 2006 at 06:24:28PM -0700, Andreas Dilger wrote:
> On Sep 13, 2006  14:11 -0400, Trond Myklebust wrote:
> > I would really have preferred a full-blown 64-bit counter as per
> > RFC3530, but I suppose we could always combine this change attribute
> > with the high word from ctime in order to make up the NFSv4 change
> > attribute. That should keep us safe until someone develops a ramdisk
> > with < 1 nsecond access time.
> 
> Trond, can you please elaborate on the need for a 64-bit version counter
> for NFSv4?

I'm not Trond, but

> What kind of requirements does NFSv4 place on the version?  Monotonic is
> probably a good bet.

The only requirement is that it be unique (assuming a file is never
modified 2^64 times).  Clients can't compare them except for equality.

> Does it need to be global for the filesystem

Nope.

> or is a per-inode version sufficient?

Yes.

> What functionality of NFSv4 needs the version?

Clients use it to revalidate their caches.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rfc: [patch] change attribute for ext3

2006-12-13 Thread Andreas Dilger
On Sep 13, 2006  14:11 -0400, Trond Myklebust wrote:
> On Wed, 2006-09-13 at 18:42 +0200, Alexandre Ratchov wrote:
> > here is a small patch that adds the "change attribute" for ext3
> > file-systems;
> > 
> > the change attribute is a simple counter that is reset to zero on
> > inode creation and that is incremented every time the inode data is
> > modified (similarly to the "ctime" time-stamp).
> 
> I would really have preferred a full-blown 64-bit counter as per
> RFC3530, but I suppose we could always combine this change attribute
> with the high word from ctime in order to make up the NFSv4 change
> attribute. That should keep us safe until someone develops a ramdisk
> with < 1 nsecond access time.

Trond, can you please elaborate on the need for a 64-bit version counter
for NFSv4?  We have been looking at something similar, but ctime+nsec is
not really sufficient as it is possible that the inode ctime can go
backward if the clock is reset.

What kind of requirements does NFSv4 place on the version?  Monotonic is
probably a good bet.  Does it need to be global for the filesystem, or
is a per-inode version sufficient?  What functionality of NFSv4 needs
the version?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ext4 devel interlock meeting minutes (Dec. 13 2006)

2006-12-13 Thread Andreas Dilger
On Dec 13, 2006  15:42 -0800, AVANTIKA R. MATHUR wrote:
> - Change Attribute:   
>  -- Andreas asked the bull team why a new field in the inode is 
> necessary rather than using the i_version field, and also mentioned that 
> all code changes can be in mark_inode_dirty.   
>  -- The ctime cannot be used for the change attribute because if the 
> server clock is incorrect, the ctime can go backwards in time.
>  -- semantics needed: nfsv4 and bull team require 32 bits, clusterfs 
> needs 64 bits.

Minor clarification.  Bull patch for NFSv4 only uses 32 bits, but apparently
the NFSv4 spec calls for 64 bits.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ext4 devel interlock meeting minutes (Dec. 13 2006)

2006-12-13 Thread AVANTIKA R. MATHUR

Ext4 Developer Interlock Call: 12-13-06 Minutes

Attendees: Mingming Cao, Dave Kleikamp(Shaggy), Andreas Dilger, Eric 
Sandeen, Ted Ts'o, Takashi Sato, Badari Pulavarty, Jean-Pierre Dion, 
Jean Noel Cordenner, Valérie Clément, Avantika Mathur


Minutes can be accessed at: 
http://ext4.wiki.kernel.org/index.php/Ext4_Developer%27s_Conference_Call


- The next interlock call will be in January

- Delayed Allocation and Multiple Block Allocation: Alex Tomas' patches 
for delayed improves performance since it batches all writes to one area 
of the disks, reducing number of seeks.   
 -- Online defragmention depends on delalloc and mballoc, these two 
should be the priority to be pushed first.
 -- Delalloc is not currently implemented on data=ordered mode; This is 
a desired feature, but adding this support should not delay merging the 
current patch.


- Preallocation: Mingming posted comment to the preallocation patches, 
asking if preallocation should be implemented for block based files so 
ext3 users could also use it.  It was decided that it will be a rare 
case where a user will want to add preallocation on existing block based 
files, but might be a nice feature to have.  Since zero-ing out the 
blocks can take time and block applications, the ideal method would be 
to first convert files to extents. If this was implemented, we would 
want an in kernel converter utility that can convert from ext3 to have 
all ext4 features and back.


- Backwards compatibility:  Eric asked if maintaining ext3 features in 
ext4 and format compatibility is necessary.  Ted believes that up until 
now it has not been too much work to maintain.  With 64bit and extents 
it may be more work, but also might be useful for some people.  In the 
future, if it is too much work or has a great impact on performance, 
then backwards compatibility may be eliminated.


- Finer Timestamp:  Ted will propose to the mailing list that nanosecond 
timestamps are only supported in large inodes.


- Change Attribute:   
 -- Andreas asked the bull team why a new field in the inode is 
necessary rather than using the i_version field, and also mentioned that 
all code changes can be in mark_inode_dirty.   
 -- The ctime cannot be used for the change attribute because if the 
server clock is incorrect, the ctime can go backwards in time.
 -- semantics needed: nfsv4 and bull team require 32 bits, clusterfs 
needs 64 bits.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Eric Sandeen
Andreas Dilger wrote:
> On Dec 13, 2006  08:40 -0600, Eric Sandeen wrote:
>> This is probably a discrepancy that comes from reading the block device 
>> directly, right?  Which is not necessarily in sync with the address 
>> space for the filesystem.
> 
> In fact, the block device IS in sync with the filesystem address space,
> but only for metadata.  The file data each has its own address space and
> is not coherent.

Ah, thanks for the clarification.

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Andreas Dilger
On Dec 13, 2006  08:40 -0600, Eric Sandeen wrote:
> This is probably a discrepancy that comes from reading the block device 
> directly, right?  Which is not necessarily in sync with the address 
> space for the filesystem.

In fact, the block device IS in sync with the filesystem address space,
but only for metadata.  The file data each has its own address space and
is not coherent.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Andreas Dilger
On Dec 13, 2006  16:02 +0100, Valerie Clement wrote:
> Alex Tomas wrote:
> >ah, it's simple. you're looking at superblock counters.
> >we don't update them every time. this was changed years
> >ago. try the same with regular ext3
>
> Yes, sorry, I didn't know that and I didn't notice that before.
> And the per-group counters are right.

It would be good, IMHO, to update the superblock counters periodically
anyways (e.g. statfs already has done the calculation).  Otherwise,
e2fsck on a mounted-but-idle system always complains about them being
incorrect.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [patch 2/3] change attribute for ext4: ext4 specific code

2006-12-13 Thread Cordenner jean noel

Sorry for the late answer.
I agree that using i_version field sounds better, I'm working on it.

regards,
Jean noel

Andreas Dilger wrote:

On Nov 29, 2006  19:54 +0100, Jean-Noel Cordenner wrote:

This part of the patch concerns the ext4 code.


I was looking more closely at this code, and wondering two things:
- why not just use the existing inode->i_version field instead of
  adding a new i_change_attribute?  The i_version is not used by
  the VFS at all, and only for detecting directory modifications in
  ext3 (where it has the same semantic as the new i_change_attribute
  anyways).  This avoids bloating the VFS inode more than it already is.
- why not just do an increment of i_version in ext3_do_update_inode()?
  That is ext3_dirty_inode->ext3_mark_inode_dirty->ext3_mark_iloc_dirty()
  and also handles all of the VFS locations that call notify_change().
  This MUST be called anywhere that we make a persistent change to the
  inode in order to flush it to disk.  That would reduce the patch to
  a few lines at most.  I don't think there are any places we need to
  supplement this (even mmap IO or writes to a hole will update mtime).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Mingming Cao

Dave Kleikamp wrote:


If anything, the block-based preallocation could initialize all of the
data to zero.  It would be slow, but it would still provide the correct
function and result in contiguous allocation.


Agree. That seems goood enough.


Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Suparna Bhattacharya
On Wed, Dec 13, 2006 at 07:36:29AM -0600, Dave Kleikamp wrote:
> On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote:
> > On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> > > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:
> 
> > > Supporting preallocation for extent based files seems fairly
> > > straightforward.  I agree we should look at this first.  After get this
> > > done, it probably worth re-consider whether to support preallocation for
> > > non-extent based files on ext4. I could imagine user upgrade from ext3
> > > to ext4, and expecting to use preallocation on those existing files
> 
> I disagree here.  Why add the complexity for what is going to be a rare
> case?  In cases where a user is going to benefit from preallocation,
> she'll probably also benefit from extents, and would be better off
> making a copy of the file, thus converting it to extents.
> 
> > I gave a thought on this initially. But, I was not sure how we should
> > implement preallocation in a non-extent based file. Using extents we can
> > mark a set of blocks as unitialized, but how will we do this for
> > non-extent based files ? If we do not have a way to mark blocks
> > uninitialized, when someone will try to read from a preallocated block,
> > it will return junk/stale data instead of zeroes.
> 
> If anything, the block-based preallocation could initialize all of the
> data to zero.  It would be slow, but it would still provide the correct
> function and result in contiguous allocation.

And posix_fallocate does that already ... 

Regards
Suparna

> 
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Valerie Clement

Alex Tomas wrote:

ah, it's simple. you're looking at superblock counters.
we don't update them every time. this was changed years
ago. try the same with regular ext3


Yes, sorry, I didn't know that and I didn't notice that before.
And the per-group counters are right.

Thanks a lot for your clarification.
  Valérie
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Alex Tomas
> Valerie Clement (VC) writes:

 VC> Alex Tomas wrote:
 >>> Valerie Clement (VC) writes:
 >> 
 VC> Hi Alex,
 VC> I did the following test:
 VC> # mkfs /dev/sdc1
 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
 VC> # dumpe2fs -h /dev/sdc1
 VC> # cp linux.tar /test
 VC> # tar xf /test/linux.tar
 VC> # dumpe2fs -h /dev/sdc1
 >> 
 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output
 VC> did not change after running the cp and tar commands.
 >> 
 >> sync should help, I think.

 VC> No, I've just tried, it does not change anything.

ah, it's simple. you're looking at superblock counters.
we don't update them every time. this was changed years
ago. try the same with regular ext3

thanks, Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Alex Tomas
> Valerie Clement (VC) writes:

 VC> Alex Tomas wrote:
 >>> Valerie Clement (VC) writes:
 >> 
 VC> Hi Alex,
 VC> I did the following test:
 VC> # mkfs /dev/sdc1
 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
 VC> # dumpe2fs -h /dev/sdc1
 VC> # cp linux.tar /test
 VC> # tar xf /test/linux.tar
 VC> # dumpe2fs -h /dev/sdc1
 >> 
 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output
 VC> did not change after running the cp and tar commands.
 >> 
 >> sync should help, I think.

 VC> No, I've just tried, it does not change anything.
 VC>   Valérie

not sure 

[EMAIL PROTECTED] ~]# mke2fs -m0 -j /dev/sda1 >&/dev/null
[EMAIL PROTECTED] ~]# mount -t ext4dev -o extents,mballoc,delalloc /dev/sda1 
/test
[EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/before
dumpe2fs 1.39.cfs1 (29-May-2006)
[EMAIL PROTECTED] ~]# dd if=/dev/zero of=/test/f bs=1M count=20
20+0 records in
20+0 records out
[EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/after1
dumpe2fs 1.39.cfs1 (29-May-2006)
[EMAIL PROTECTED] ~]# sync
[EMAIL PROTECTED] ~]# dumpe2fs /dev/sda1 >/tmp/after2
dumpe2fs 1.39.cfs1 (29-May-2006)
[EMAIL PROTECTED] ~]# diff -pu /tmp/before /tmp/after1 
--- /tmp/before 2006-12-13 14:39:42.541828787 +0300
+++ /tmp/after1 2006-12-13 14:39:59.602086138 +0300
@@ -45,9 +45,9 @@ Group 0: (Blocks 0-32767)
   Reserved GDT blocks at 2-245
   Block bitmap at 246 (+246), Inode bitmap at 247 (+247)
   Inode table at 248-752 (+248)
-  15607 free blocks, 16149 free inodes, 2 directories
+  15607 free blocks, 16148 free inodes, 2 directories
   Free blocks: 17161-32767
-  Free inodes: 12-16160
+  Free inodes: 13-16160
 Group 1: (Blocks 32768-65535)
   Backup superblock at 32768, Group descriptors at 32769-32769
   Reserved GDT blocks at 32770-33013
[EMAIL PROTECTED] ~]# diff -pu /tmp/after1 /tmp/after2 
--- /tmp/after1 2006-12-13 14:39:59.602086138 +0300
+++ /tmp/after2 2006-12-13 14:40:05.592176495 +0300
@@ -45,8 +45,8 @@ Group 0: (Blocks 0-32767)
   Reserved GDT blocks at 2-245
   Block bitmap at 246 (+246), Inode bitmap at 247 (+247)
   Inode table at 248-752 (+248)
-  15607 free blocks, 16148 free inodes, 2 directories
-  Free blocks: 17161-32767
+  10487 free blocks, 16148 free inodes, 2 directories
+  Free blocks: 17161-22527, 27648-32767
   Free inodes: 13-16160
 Group 1: (Blocks 32768-65535)
   Backup superblock at 32768, Group descriptors at 32769-32769
[EMAIL PROTECTED] ~]# 


try to repeat with a single file?

thanks, Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Eric Sandeen

Alex Tomas wrote:

Valerie Clement (VC) writes:


 VC> Hi Alex,
 VC> I did the following test:
 VC> # mkfs /dev/sdc1
 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
 VC> # dumpe2fs -h /dev/sdc1
 VC> # cp linux.tar /test
 VC> # tar xf /test/linux.tar
 VC> # dumpe2fs -h /dev/sdc1

 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output
 VC> did not change after running the cp and tar commands.

sync should help, I think.


This is probably a discrepancy that comes from reading the block device 
directly, right?  Which is not necessarily in sync with the address 
space for the filesystem.


-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Valerie Clement

Alex Tomas wrote:


+/*
+ * entry function for inode syncing
+ * it's responsbility is to sort all inode out in their locality groups
+ */
+void ext4_lg_sync_inodes(struct super_block *sb, struct writeback_control *wbc)
+{
+   struct ext4_sb_info *sbi = EXT4_SB(sb);
+   struct ext4_locality_group *lg;
+
+   /* refill pending groups from s_dirty */
+   spin_lock(&inode_lock);
+   while (!list_empty(&sb->s_dirty)) {
+   struct inode *inode = list_entry(sb->s_dirty.prev,
+   struct inode, i_list);
+   struct ext4_inode_info *ei = EXT4_I(inode);
+
+   lg = ei->i_locality_group;
+   if (lg == NULL) {
+   if (S_ISDIR(inode->i_mode) || i_size_read(inode) == 0) {
+   if (atomic_read(&inode->i_count)) {
+   /*
+* The inode is clean, inuse
+*/
+   list_move(&inode->i_list, 
&inode_in_use);
+   } else {
+   /*
+* The inode is clean, unused
+*/
+   list_move(&inode->i_list, 
&inode_unused);
+   }
+   continue;
+   }
+   /* XXX: atime changed ? */
+   list_move(&inode->i_list, &inode_in_use);
+   continue;
+   }
+
+   /* move inode in proper locality group's dirty list */
+   spin_lock(&lg->lg_lock);
+   list_move_tail(&inode->i_list, &lg->lg_dirty);
+   spin_unlock(&lg->lg_lock);
+
+   if (!test_and_set_bit(EXT4_LG_DIRTY, &lg->lg_flags))
+   list_move(&lg->lg_list, &sbi->s_locality_dirty);
+   }
+   spin_unlock(&inode_lock);
+
+   ext4_lg_sync_groups(sb, wbc);
+}

Hi Alex,
I did the following test:
# mkfs /dev/sdc1
# mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
# dumpe2fs -h /dev/sdc1
# cp linux.tar /test
# tar xf /test/linux.tar
# dumpe2fs -h /dev/sdc1

The "Free blocks" and "Free inodes" counters in the dumpe2fs output did 
not change after running the cp and tar commands.


I think it misses a call to ext4_commit_super() at the end of the 
ext4_lg_sync_inodes function.

Am I right ?

   Valérie
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Valerie Clement

Alex Tomas wrote:

Valerie Clement (VC) writes:


 VC> Hi Alex,
 VC> I did the following test:
 VC> # mkfs /dev/sdc1
 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
 VC> # dumpe2fs -h /dev/sdc1
 VC> # cp linux.tar /test
 VC> # tar xf /test/linux.tar
 VC> # dumpe2fs -h /dev/sdc1

 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output
 VC> did not change after running the cp and tar commands.

sync should help, I think.


No, I've just tried, it does not change anything.
  Valérie



 VC> I think it misses a call to ext4_commit_super() at the end of the
 VC> ext4_lg_sync_inodes function.
 VC> Am I right ?

what for? allocation was delayed, no blocks were allocated.

thanks, Alex



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] ext4-locality-groups patch

2006-12-13 Thread Alex Tomas
> Valerie Clement (VC) writes:

 VC> Hi Alex,
 VC> I did the following test:
 VC> # mkfs /dev/sdc1
 VC> # mount -t ext4dev -o extents,mballoc,delalloc /dev/sdc1 /test
 VC> # dumpe2fs -h /dev/sdc1
 VC> # cp linux.tar /test
 VC> # tar xf /test/linux.tar
 VC> # dumpe2fs -h /dev/sdc1

 VC> The "Free blocks" and "Free inodes" counters in the dumpe2fs output
 VC> did not change after running the cp and tar commands.

sync should help, I think.

 VC> I think it misses a call to ext4_commit_super() at the end of the
 VC> ext4_lg_sync_inodes function.
 VC> Am I right ?

what for? allocation was delayed, no blocks were allocated.

thanks, Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Dave Kleikamp
On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote:
> On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:

> > Supporting preallocation for extent based files seems fairly
> > straightforward.  I agree we should look at this first.  After get this
> > done, it probably worth re-consider whether to support preallocation for
> > non-extent based files on ext4. I could imagine user upgrade from ext3
> > to ext4, and expecting to use preallocation on those existing files

I disagree here.  Why add the complexity for what is going to be a rare
case?  In cases where a user is going to benefit from preallocation,
she'll probably also benefit from extents, and would be better off
making a copy of the file, thus converting it to extents.

> I gave a thought on this initially. But, I was not sure how we should
> implement preallocation in a non-extent based file. Using extents we can
> mark a set of blocks as unitialized, but how will we do this for
> non-extent based files ? If we do not have a way to mark blocks
> uninitialized, when someone will try to read from a preallocated block,
> it will return junk/stale data instead of zeroes.

If anything, the block-based preallocation could initialize all of the
data to zero.  It would be slow, but it would still provide the correct
function and result in contiguous allocation.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Amit K. Arora
On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:
> > +
> > +   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> > +   return -ENOTTY;
> >
> 
> Supporting preallocation for extent based files seems fairly
> straightforward.  I agree we should look at this first.  After get this
> done, it probably worth re-consider whether to support preallocation for
> non-extent based files on ext4. I could imagine user upgrade from ext3
> to ext4, and expecting to use preallocation on those existing files

I gave a thought on this initially. But, I was not sure how we should
implement preallocation in a non-extent based file. Using extents we can
mark a set of blocks as unitialized, but how will we do this for
non-extent based files ? If we do not have a way to mark blocks
uninitialized, when someone will try to read from a preallocated block,
it will return junk/stale data instead of zeroes.

But, if we can think of a solution here then it will be as simple as
removing the check above and replacing "ext4_ext_get_blocks()" with
"ext4_get_blocks_wrap()" in the while() loop.

> 
> > +
> > +   block = EXT4_BLOCK_ALIGN(input.offset, blkbits) >> blkbits;
> > +   max_blocks = EXT4_BLOCK_ALIGN(input.len, blkbits) >> blkbits;

I was wondering if I should change above lines to this :

+   block = input.offset >> blkbits;
+   max_blocks = (EXT4_BLOCK_ALIGN(input.len+input.offset,
blkbits) >> blkbits) - block;

Reason is that the block which contains the offset, should also be
preallocated. And the max_blocks should be calculated accordingly.

> > +   while(ret>=0 && ret > +   {
> > +   block = block + ret;
> > +   max_blocks = max_blocks - ret;
> > +   ret = ext4_ext_get_blocks(handle, inode, block,
> > +   max_blocks, &map_bh,
> > +   EXT4_CREATE_UNINITIALIZED_EXT, 1);
> 
> Since the interface takes offset and number of blocks to allocate, I
> assuming we are going to handle holes in preallocation, thus, we cannot
> not mark the extend_size flag to 1 when calling ext4_ext_get_blocks.
> 
> I think we should update i_size and i_disksize after preallocation. Oh,
> to protect parallel updating i_size, we have to take i_mutex down.

Okay. So, is this what you want to be done here :

+retry:
+ret = 0;
+while(ret>=0 && ret 0 && test_bit(BH_New, &map_bh.b_state))
+nblocks = nblocks + ret;
+}
+if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb,
+&retries))
+goto retry;
+
+if(nblocks) {
+mutex_lock(&inode->i_mutex);
+inode->i_size = inode->i_size + (nblocks >> blkbits);
+EXT4_I(inode)->i_disksize = inode->i_size;
+mutex_unlock(&inode->i_mutex);
+}

> 
> > +   }
> > +   ext4_mark_inode_dirty(handle, inode);
> > +   ext4_journal_stop(handle);
> > +
> 
> Error code returned by ext4_journal_stop() is being ignored here, is
> this right?
> Well, there are other places in ext34/ioctl.c which ignore the return
> returned by ext4_journal_stop(), maybe should fix this in a separate
> patch.

Agreed. I think following should take care of it:

+   ext4_mark_inode_dirty(handle, inode);
+   ret2 = ext4_journal_stop(handle);
+   if(ret > 0)
+   ret = ret2;
+   return ret > 0 ? nblocks : ret;

> > +   return ret>0?0:ret;
> > +   }
> >
> >
> Oh, what if we failed to allocate the full amount of blocks? i.e, the
> ext4_ext_get_blocks() returns -ENOSPC error and exit the loop early. Are
> we going to return error, or try something like
> 
> if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
>   goto retry
> 
> I wonder it might be useful to return the actual number of blocks
> preallocated back to the application.

Ok. Yes, makes sense. We can return the number of "new" blocks like
this:
+   return ret > 0 ? nblocks : ret;



Please let me know if you agree with the above set of changes, and any
further comments you have. I will then update and test the new patch and
post it again. Thanks!

Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html