Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-12 Thread Aneesh Kumar K.V



Kalpak Shah wrote:

On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:

On Sun, 01 Jul 2007 03:36:56 -0400
Mingming Cao <[EMAIL PROTECTED]> wrote:


This patch is a spinoff of the old nanosecond patches.

I don't know what the "old nanosecond patches" are.  A link to a suitable
changlog for those patches would do in a pinch.  Preferable would be to
write a proper changelog for this patch.


The incremental patch contains a proper changelog describing the patch.




Instead of  putting incremental patches it would be nice if we can have 
replacement patches.
for the already existing patches with the comments addressed. For example if we have a 
review comment on the patch message ( commit log ) then adding an incremental patch doesn't help.



-aneesh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Random corruption test for e2fsck

2007-07-12 Thread Andi Kleen
On Thu, Jul 12, 2007 at 04:16:24PM -0600, Andreas Dilger wrote:
> On Jul 12, 2007  13:09 +0200, Andi Kleen wrote:
> > > "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> > > random numbers from bash.  We would also be at the mercy of the shell
> > > being identical on the user and debugger's systems.
> > 
> > With /dev/urandom you have the guarantee you'll never ever reproduce
> > it again. 
> 
> That is kind of the point of this testing - getting new test images for
> each user that runs "make check" or "make rpm".  I'm We also save the
> generated image before e2fsck touches it so that it can be used for
> debugging if needed.

If you seed a good pseudo RNG with the time (or even a few bytes from 
/dev/urandom; although the time tends to work as well) you'll also effectively 
get a new image every time.

But the advantage is if you print out the seed the image
can be easily recreated just by re-running the fuzzer
with the same seed. No need to ship potentially huge images
around.

You can essentially compress your whole image into a single
number this way.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Random corruption test for e2fsck

2007-07-12 Thread Andreas Dilger
On Jul 12, 2007  13:09 +0200, Andi Kleen wrote:
> > "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> > random numbers from bash.  We would also be at the mercy of the shell
> > being identical on the user and debugger's systems.
> 
> With /dev/urandom you have the guarantee you'll never ever reproduce
> it again. 

That is kind of the point of this testing - getting new test images for
each user that runs "make check" or "make rpm".  I'm We also save the
generated image before e2fsck touches it so that it can be used for
debugging if needed.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks

2007-07-12 Thread Andrew Morton
On Thu, 12 Jul 2007 08:57:51 -0500 Dave Kleikamp <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-07-12 at 12:38 +0100, Andy Whitcroft wrote:
> > Andrew Morton wrote:
> > >> +if (ext4_ext_check_header(inode, 
> > >> ext_block_hdr(bh),
> > >> +depth - i - 1)) 
> > >> {
> > >> +err = -EIO;
> > >> +break;
> > >> +}
> > >> +path[i+1].p_bh = bh;
> > > 
> > > Really that should have been "i + 1".  checkpatch misses this.  It seems 
> > > to
> > > be missing more stuff that it used to lately.
> > 
> > This one is difficult.  The rules up to now have been consistent spacing
> > is required on both sides of mathematics operators.  I personally like
> > spaces always, but we do tend to use them without spaces too where the
> > binding is effectivly part of the value -- the classic case is something
> > like:
> > 
> > pfn << MAX_ORDER-1
> > 
> > In allowing that sort of thing, we implictly allow the one you note
> > above.  We have tried to be overly annoying on these things, and so the
> > check is consistancy, spaces both or neither.  We could be stricter.
> 
> I personally think stricter is better.  An occasionally false-positive
> isn't going to hurt anyone.  (Well, maybe the checkpatch.pl maintainers
> will get nagged.)  It at least will cause the developer to look at the
> line of code in question and make a conscious decision to leave it as it
> is.  I'm assuming that upstream maintainers use checkpatch.pl with some
> constraint, and don't throw every patch that produces a warning back at
> the submitter.
> 

I'm in two minds.  Missing-the-spaces is pretty damn common and is sometimes
a reasonable way of saving quite a lot of horizontal space.  I spose we could
take it out again if it's causing problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Initial results of FLEX_BG feature.

2007-07-12 Thread Jose R. Santos
On Wed, 11 Jul 2007 18:14:25 -0400
Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> > Right now what I've done is allocate the bitmaps and inode tables at the
> > beginning of each group of 64 BG.  Still need to work on fsck since just
> > removing the restriction on were the bitmaps and inode table are
> > located still gives me errors of uninitialized inodes with dtime set.
> > Seems like fsck still expect inode information to be located at
> > specific locations within the disk.
> 
> Can you send me the patch which you were playing with?  I might be
> able to help you with this.  It should be pretty straightforward to
> remove the constraint on the inode table location.  

Here is the kernel piece.

-JRS

---
 fs/ext4/super.c |3 3 + 0 - 0 !
 include/linux/ext4_fs.h |4 3 + 1 - 0 !
 2 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/ext4/super.c
===
--- linux-2.6.orig/fs/ext4/super.c  2007-07-11 15:34:58.0 -0500
+++ linux-2.6/fs/ext4/super.c   2007-07-11 16:19:08.0 -0500
@@ -1271,6 +1271,9 @@ static int ext4_check_descriptors (struc
 
ext4_debug ("Checking group descriptors");
 
+   if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG))
+   return 1;
+
for (i = 0; i < sbi->s_groups_count; i++)
{
if (i == sbi->s_groups_count - 1)
Index: linux-2.6/include/linux/ext4_fs.h
===
--- linux-2.6.orig/include/linux/ext4_fs.h  2007-07-11 15:34:58.0 
-0500
+++ linux-2.6/include/linux/ext4_fs.h   2007-07-12 09:58:51.0 -0500
@@ -698,13 +698,15 @@ static inline int ext4_valid_inum(struct
 #define EXT4_FEATURE_INCOMPAT_META_BG  0x0010
 #define EXT4_FEATURE_INCOMPAT_EXTENTS  0x0040 /* extents support */
 #define EXT4_FEATURE_INCOMPAT_64BIT0x0080
+#define EXT4_FEATURE_INCOMPAT_FLEX_BG  0x0200
 
 #define EXT4_FEATURE_COMPAT_SUPP   EXT2_FEATURE_COMPAT_EXT_ATTR
 #define EXT4_FEATURE_INCOMPAT_SUPP (EXT4_FEATURE_INCOMPAT_FILETYPE| \
 EXT4_FEATURE_INCOMPAT_RECOVER| \
 EXT4_FEATURE_INCOMPAT_META_BG| \
 EXT4_FEATURE_INCOMPAT_EXTENTS| \
-EXT4_FEATURE_INCOMPAT_64BIT)
+EXT4_FEATURE_INCOMPAT_64BIT| \
+EXT4_FEATURE_INCOMPAT_FLEX_BG)
 #define EXT4_FEATURE_RO_COMPAT_SUPP(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
 EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
 EXT4_FEATURE_RO_COMPAT_DIR_NLINK | \



-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Initial results of FLEX_BG feature.

2007-07-12 Thread Jose R. Santos
On Wed, 11 Jul 2007 18:14:25 -0400
Theodore Tso <[EMAIL PROTECTED]> wrote:
> On Wed, Jul 11, 2007 at 12:30:04AM -0500, Jose R. Santos wrote:
> > Right now what I've done is allocate the bitmaps and inode tables at the
> > beginning of each group of 64 BG.  Still need to work on fsck since just
> > removing the restriction on were the bitmaps and inode table are
> > located still gives me errors of uninitialized inodes with dtime set.
> > Seems like fsck still expect inode information to be located at
> > specific locations within the disk.
> 
> Can you send me the patch which you were playing with?  I might be
> able to help you with this.  It should be pretty straightforward to
> remove the constraint on the inode table location.  
> 
> It really should only be a check in e2fsck/super.c:check_super_block(),
> as far as I know.
> 
> If you're seeing errors of unitialized inodes with dtime set, that
> sounds like maybe something else is going on.  All of e2fsprogs should
> be referencing the inode table via fs->group_desc[group_num].bg_inode_table.  
> See lib/ext2fs/inode.c, functions ext2fs_open_inode_scan(), 
> get_next_blockgroup(), and ext2fs_read_inode_full().
> 
>   - Ted

Here is a very rough patch of the FLEX_BG feature implementation.
Still works as a prototype but there are a couple of thing that are
either broken or hard coded.  As it currently stands, it can not be use
to create filesystem without the FLEX_BG features as I have not made
ext2fs_allocate_tables() backward compatible.

The number of groups per flex group is also hard coded to 64.  Still
thinking on whether I should add this to the super block it self in
order to help recovery of the filesystem as well as possibly making
allocation algorithms in the kernel aware of the new groups
arrangements.

I create a filesystem using the following command:

mke2fs -j -O meta_bg,flex_bg  /dev/sdh

While meta_bg is not required, having block group descriptors spread
across the multiple block groups does increase the chances of
fragmenting the meta data.

-JRS

diff -Naurp e2fsprogs-1.40/e2fsck/super.c 
/home/jsantos/e2fsprogs-1.40-flex/e2fsck/super.c
--- e2fsprogs-1.40/e2fsck/super.c   2007-06-03 23:48:01.0 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/e2fsck/super.c2007-07-09 
11:27:56.0 -0500
@@ -580,27 +580,31 @@ void check_super_block(e2fsck_t ctx)
 
first_block = ext2fs_group_first_block(fs, i);
last_block = ext2fs_group_last_block(fs, i);
-
+/*
if ((gd->bg_block_bitmap < first_block) ||
(gd->bg_block_bitmap > last_block)) {
pctx.blk = gd->bg_block_bitmap;
if (fix_problem(ctx, PR_0_BB_NOT_GROUP, &pctx))
gd->bg_block_bitmap = 0;
}
+*/
if (gd->bg_block_bitmap == 0) {
ctx->invalid_block_bitmap_flag[i]++;
ctx->invalid_bitmaps++;
}
+/*
if ((gd->bg_inode_bitmap < first_block) ||
(gd->bg_inode_bitmap > last_block)) {
pctx.blk = gd->bg_inode_bitmap;
if (fix_problem(ctx, PR_0_IB_NOT_GROUP, &pctx))
gd->bg_inode_bitmap = 0;
}
+*/
if (gd->bg_inode_bitmap == 0) {
ctx->invalid_inode_bitmap_flag[i]++;
ctx->invalid_bitmaps++;
}
+/*
if ((gd->bg_inode_table < first_block) ||
((gd->bg_inode_table +
  fs->inode_blocks_per_group - 1) > last_block)) {
@@ -608,6 +612,7 @@ void check_super_block(e2fsck_t ctx)
if (fix_problem(ctx, PR_0_ITABLE_NOT_GROUP, &pctx))
gd->bg_inode_table = 0;
}
+*/
if (gd->bg_inode_table == 0) {
ctx->invalid_inode_table_flag[i]++;
ctx->invalid_bitmaps++;
diff -Naurp e2fsprogs-1.40/lib/e2p/feature.c 
/home/jsantos/e2fsprogs-1.40-flex/lib/e2p/feature.c
--- e2fsprogs-1.40/lib/e2p/feature.c2007-03-21 15:46:10.0 -0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/e2p/feature.c 2007-07-10 
15:21:25.0 -0500
@@ -67,6 +67,8 @@ static struct feature feature_list[] = {
"extent" },
{   E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_64BIT,
"64bit" },
+   {   E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FLEX_BG,
+   "flex_bg"},
{   0, 0, 0 },
 };
 
diff -Naurp e2fsprogs-1.40/lib/ext2fs/alloc_tables.c 
/home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/alloc_tables.c
--- e2fsprogs-1.40/lib/ext2fs/alloc_tables.c2006-09-12 13:56:16.0 
-0500
+++ /home/jsantos/e2fsprogs-1.40-flex/lib/ext2fs/alloc_tables.c 2007-07-12 
09:34:38.988302340 -0500

Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-12 Thread Andreas Dilger
On Jul 12, 2007  13:56 +0530, Amit K. Arora wrote:
> As you suggest, let us just have two modes for the time being:
> 
> #define FALLOC_ALLOCATE   0x1
> #define FALLOC_ALLOCATE_KEEP_SIZE 0x2
> 
> As the name suggests, when FALLOC_ALLOCATE_KEEP_SIZE mode is passed it
> will result in file size not being changed even if the preallocation is
> beyond EOF.

What does FALLOC_ALLOCATE mean vs. not passing this flag?  I have no
objection to this as long as the code remains with these as "flags"
instead of "modes"...  Essentially just dropping the FALLOC_FL_DEALLOCATE
and FALLOC_FL_DEL_DATA from the interface.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-12 Thread Amit K. Arora
On Thu, Jul 12, 2007 at 11:13:34PM +1000, David Chinner wrote:
> On Thu, Jul 12, 2007 at 12:58:13PM +0530, Suparna Bhattacharya wrote:
> > 
> > Why don't we just merge the interface for preallocation (essentially
> > enough to satisfy posix_fallocate() and the simple XFS requirement for 
> > space reservation without changing file size), which there is clear 
> > agreement
> > on (I hope :)).  After all, this was all that we set out to do when we
> > started.
> > 
> > And leave all the dealloc/punch/hsm type features for separate future 
> > patches/
> > debates, those really shouldn't hold up the basic fallocate interface.
> > I agree with Christoph that we are just diverging too much in trying to
> > club those decisions here.
> > 
> > Dave, Andreas, Ted ?
> 
> Sure. I'll just make XFS work with whatever it is that gets merged.

Great. I will post the new patches soon.

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks

2007-07-12 Thread Dave Kleikamp
On Thu, 2007-07-12 at 12:38 +0100, Andy Whitcroft wrote:
> Andrew Morton wrote:
> >> +  if (ext4_ext_check_header(inode, ext_block_hdr(bh),
> >> +  depth - i - 1)) {
> >> +  err = -EIO;
> >> +  break;
> >> +  }
> >> +  path[i+1].p_bh = bh;
> > 
> > Really that should have been "i + 1".  checkpatch misses this.  It seems to
> > be missing more stuff that it used to lately.
> 
> This one is difficult.  The rules up to now have been consistent spacing
> is required on both sides of mathematics operators.  I personally like
> spaces always, but we do tend to use them without spaces too where the
> binding is effectivly part of the value -- the classic case is something
> like:
> 
>   pfn << MAX_ORDER-1
> 
> In allowing that sort of thing, we implictly allow the one you note
> above.  We have tried to be overly annoying on these things, and so the
> check is consistancy, spaces both or neither.  We could be stricter.

I personally think stricter is better.  An occasionally false-positive
isn't going to hurt anyone.  (Well, maybe the checkpatch.pl maintainers
will get nagged.)  It at least will cause the developer to look at the
line of code in question and make a conscious decision to leave it as it
is.  I'm assuming that upstream maintainers use checkpatch.pl with some
constraint, and don't throw every patch that produces a warning back at
the submitter.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-12 Thread David Chinner
On Thu, Jul 12, 2007 at 12:58:13PM +0530, Suparna Bhattacharya wrote:
> 
> Why don't we just merge the interface for preallocation (essentially
> enough to satisfy posix_fallocate() and the simple XFS requirement for 
> space reservation without changing file size), which there is clear agreement
> on (I hope :)).  After all, this was all that we set out to do when we
> started.
> 
> And leave all the dealloc/punch/hsm type features for separate future patches/
> debates, those really shouldn't hold up the basic fallocate interface.
> I agree with Christoph that we are just diverging too much in trying to
> club those decisions here.
> 
> Dave, Andreas, Ted ?

Sure. I'll just make XFS work with whatever it is that gets merged.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 3][PATCH 1/1] ext4 nanosecond timestamp

2007-07-12 Thread Kalpak Shah
On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
> On Sun, 01 Jul 2007 03:36:56 -0400
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > This patch is a spinoff of the old nanosecond patches.
> 
> I don't know what the "old nanosecond patches" are.  A link to a suitable
> changlog for those patches would do in a pinch.  Preferable would be to
> write a proper changelog for this patch.

The incremental patch contains a proper changelog describing the patch.

> > +static inline __le32 ext4_encode_extra_time(struct timespec *time)
> > +{
> > +   return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
> > +  time->tv_sec >> 32 : 0) |
> > +  ((time->tv_nsec << 2) & EXT4_NSEC_MASK));
> > +}
> > +
> > +static inline void ext4_decode_extra_time(struct timespec *time, __le32 
> > extra)
> > +{
> > +   if (sizeof(time->tv_sec) > 4)
> > +  time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
> > +  << 32;
> > +   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> 2;
> > +}
> 
> Consider uninlining these functions.

Done.

This patch adds comments, few small corrections and an appropriate
changelog entry.

Thanks,
Kalpak.
This patch adds nanosecond timestamps for ext4. This involves adding
*time_extra fields to the ext4_inode to extend the timestamps to 64-bits.
Creation time is also added by this patch.

These extended fields will fit into an inode if the filesystem was formatted
with large inodes (-I 256 or larger) and there are currently no EAs
consuming all of the available space. For new inodes we always reserve
enough space for the kernel's known extended fields, but for inodes
created with an old kernel this might not have been the case. So this patch
also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that
older kernels can't create inodes with a smaller extra_isize). which indicates
if the fields fitting inside s_min_extra_isize are available or not.
If the expansion of inodes if unsuccessful then this feature will be disabled.
This feature is only enabled if requested by the sysadmin.

None of the extended inode fields is critical for correct filesystem operation.

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>
Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

Index: linux-2.6.22/include/linux/ext4_fs.h
===
--- linux-2.6.22.orig/include/linux/ext4_fs.h
+++ linux-2.6.22/include/linux/ext4_fs.h
@@ -350,20 +350,30 @@ struct ext4_inode {
 #define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
 #define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)
 
+/*
+ * Extended fields will fit into an inode if the filesystem was formatted
+ * with large inodes (-I 256 or larger) and there are not currently any EAs
+ * consuming all of the available space. For new inodes we always reserve
+ * enough space for the kernel's known extended fields, but for inodes
+ * created with an old kernel this might not have been the case. None of
+ * the extended inode fields is critical for correct filesystem operation.
+ * This macro checks if a certain field fits in the inode. Note that
+ * inode-size = GOOD_OLD_INODE_SIZE + i_extra_isize
+ */
 #define EXT4_FITS_IN_INODE(ext4_inode, einode, field)	\
 	((offsetof(typeof(*ext4_inode), field) +	\
 	  sizeof((ext4_inode)->field))			\
 	<= (EXT4_GOOD_OLD_INODE_SIZE +			\
 	(einode)->i_extra_isize))			\
 
-static inline __le32 ext4_encode_extra_time(struct timespec *time)
+static __le32 ext4_encode_extra_time(struct timespec *time)
 {
return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
 			   time->tv_sec >> 32 : 0) |
 			   ((time->tv_nsec << 2) & EXT4_NSEC_MASK));
 }
 
-static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
+static void ext4_decode_extra_time(struct timespec *time, __le32 extra)
 {
if (sizeof(time->tv_sec) > 4)
 	   time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
Index: linux-2.6.22/include/linux/ext4_fs_i.h
===
--- linux-2.6.22.orig/include/linux/ext4_fs_i.h
+++ linux-2.6.22/include/linux/ext4_fs_i.h
@@ -153,6 +153,10 @@ struct ext4_inode_info {
 
 	unsigned long i_ext_generation;
 	struct ext4_ext_cache i_cached_extent;
+	/*
+	 * File creation time. Its function is same as that of
+	 * struct timespec i_{a,c,m}time in the generic inode.
+	 */
 	struct timespec i_crtime;
 };
 


Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-12 Thread Kalpak Shah
On Tue, 2007-07-10 at 16:32 -0700, Andrew Morton wrote:
> On Sun, 01 Jul 2007 03:38:01 -0400
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > This patch is on top of the nanosecond timestamp and i_version_hi
> > patches. 
> 
> This sort of information isn't needed (or desired) when this patch hits the
> git tree.  Please ensure that things like this are cleaned up before the
> patches go to Linus.

The incremental patch I have attached contains an updated Changelog
entry which cleans this up.

> > +   !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) {
> > +   /* We need extra buffer credits since we may write into EA block
> > +* with this same handle */
> > +   if ((jbd2_journal_extend(handle,
> > +EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
> > +   ret = ext4_expand_extra_isize(inode,
> > +   EXT4_SB(inode->i_sb)->s_want_extra_isize,
> > +   iloc, handle);
> > +   if (ret) {
> > +   EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
> > +   if (!expand_message) {
> > +   ext4_warning(inode->i_sb, __FUNCTION__,
> > +   "Unable to expand inode %lu. Delete"
> > +   " some EAs or run e2fsck.",
> > +   inode->i_ino);
> > +   expand_message = 1;
> > +   }
> > +   }
> > +   }
> > +   }
> 
> Maybe that message should come out once per mount rather than once per boot
> (or once per modprobe)?

Done. Now the message gets printed the first time s_mnt_count changes,
which means once per mount.

> > +   if (free < new_extra_isize) {
> > +   if (!tried_min_extra_isize && s_min_extra_isize) {
> > +   tried_min_extra_isize++;
> > +   new_extra_isize = s_min_extra_isize;
> 
> Aren't we missing a brelse(bh) here?

I have corrected this.

> >  
> > +#define IHDR(inode, raw_inode) \
> > +   ((struct ext4_xattr_ibody_header *) \
> > +   ((void *)raw_inode + \
> > +   EXT4_GOOD_OLD_INODE_SIZE + \
> > +   EXT4_I(inode)->i_extra_isize))
> > +#define IFIRST(hdr) ((struct ext4_xattr_entry *)((hdr)+1))
> 
> Neither of these _have_ to be implemented as macros and hence they should
> not be.

These macros existed before and have just been moved. There are similar
such macros in the ext3/4 xattr code and they should probably be changed
to inlines. But that should be done in a different patch.

Thanks,
Kalpak.
We need to make sure that existing ext3 filesystems can also avail the new
fields that have been added to the ext4 inode. We use s_want_extra_isize and
s_min_extra_isize to decide by how much we should expand the inode. If
EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand the inode by
max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) -
EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question
about whether users should be able to set s_*_extra_isize smaller than
the known fields or not.

This patch also adds the functionality to expand inodes to include the
newly added fields. We start by trying to expand by s_want_extra_isize
bytes and if its fails we try to expand by s_min_extra_isize bytes. This
is done by changing the i_extra_isize if enough space is available in
the inode and no EAs are present. If EAs are present and there is enough
space in the inode then the EAs in the inode are shifted to make space.
If enough space is not available in the inode due to the EAs then 1 or
more EAs are shifted to the external EA block. In the worst case when
even the external EA block does not have enough space we inform the user
that some EA would need to be deleted or s_min_extra_isize would have to
be reduced.

This would be online expansion of inodes. expand_extra_isize option will also
be added to e2fsck which will expand all the inodes and if for any reason 
expansion fails for any inode then the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE
feature will be reset.

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Kalpak Shah <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

Index: linux-2.6.22/fs/ext4/inode.c
===
--- linux-2.6.22.orig/fs/ext4/inode.c
+++ linux-2.6.22/fs/ext4/inode.c
@@ -3130,9 +3130,8 @@ int ext4_expand_extra_isize(struct inode
 	struct ext4_xattr_ibody_header *header;
 	struct ext4_xattr_entry *entry;
 
-	if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) {
+	if (EXT4_I(inode)->i_extra_isize >= new_extra_isize)
 		return 0;
-	}
 
 	raw_inode = ext4_raw_inode(&iloc);
 
@@ -3148,9 +3147,9 @@ int ext4_expand_extra_isize(struct inode
 		return 0;
 	}
 
-	/* try to expand with EA present */
+	/* try to expand with

Re: [EXT4 set 2][PATCH 2/5] cleanups: Add extent sanity checks

2007-07-12 Thread Andy Whitcroft
Andrew Morton wrote:
> On Sun, 01 Jul 2007 03:36:22 -0400
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
>> with the patch all headers are checked. the code should become
>> more resistant to on-disk corruptions. needless BUG_ON() have
>> been removed. please, review for inclusion.
>>
>> ...
> 
>> @@ -269,6 +239,70 @@
>>  return size;
>>  }
>>  
>> +static inline int
>> +ext4_ext_max_entries(struct inode *inode, int depth)
> 
> Please remove the `inline'.
> 
> `inline' is usually wrong.  It is wrong here.  If this function has a
> single callsite (it has) then the compiler will inline it.  If the function
> later gains more callsites then the compiler will know (correctly) not to
> inline it.
> 
> We hope.  If we find the compiler still inlines a function as large as this
> one then we'd need to use noinline and complain at the gcc developers.
> 
>> +{
>> +int max;
>> +
>> +if (depth == ext_depth(inode)) {
>> +if (depth == 0)
>> +max = ext4_ext_space_root(inode);
>> +else
>> +max = ext4_ext_space_root_idx(inode);
>> +} else {
>> +if (depth == 0)
>> +max = ext4_ext_space_block(inode);
>> +else
>> +max = ext4_ext_space_block_idx(inode);
>> +}
>> +
>> +return max;
>> +}
>> +
>> +static int __ext4_ext_check_header(const char *function, struct inode 
>> *inode,
>> +struct ext4_extent_header *eh,
>> +int depth)
>> +{
>> +const char *error_msg = NULL;
> 
> Unneeded initialisation.
> 
>> +int max = 0;
>> +
>> +if (unlikely(eh->eh_magic != EXT4_EXT_MAGIC)) {
>> +error_msg = "invalid magic";
>> +goto corrupted;
>> +}
>> +if (unlikely(le16_to_cpu(eh->eh_depth) != depth)) {
>> +error_msg = "unexpected eh_depth";
>> +goto corrupted;
>> +}
>> +if (unlikely(eh->eh_max == 0)) {
>> +error_msg = "invalid eh_max";
>> +goto corrupted;
>> +}
>> +max = ext4_ext_max_entries(inode, depth);
>> +if (unlikely(le16_to_cpu(eh->eh_max) > max)) {
>> +error_msg = "too large eh_max";
>> +goto corrupted;
>> +}
>> +if (unlikely(le16_to_cpu(eh->eh_entries) > le16_to_cpu(eh->eh_max))) {
>> +error_msg = "invalid eh_entries";
>> +goto corrupted;
>> +}
>> +return 0;
>> +
>> +corrupted:
>> +ext4_error(inode->i_sb, function,
>> +"bad header in inode #%lu: %s - magic %x, "
>> +"entries %u, max %u(%u), depth %u(%u)",
>> +inode->i_ino, error_msg, le16_to_cpu(eh->eh_magic),
>> +le16_to_cpu(eh->eh_entries), le16_to_cpu(eh->eh_max),
>> +max, le16_to_cpu(eh->eh_depth), depth);
>> +
>> +return -EIO;
>> +}
>> +
>>
>> ...
>>
>> +i = depth = ext_depth(inode);
>>
> 
> Mulitple assignments like this are somewhat unpopular from the coding-style
> POV.  
> 
> We have a local variable called `i' which is not used as a simple counter
> and which has no explanatory comment.  This is plain bad.  Please find a
> better name for this variable.  Review the code for other such lack of
> clarity - I'm sure there's more.
> 
> 
>> -BUG_ON(le16_to_cpu(eh->eh_entries) > le16_to_cpu(eh->eh_max));
>> -BUG_ON(eh->eh_magic != EXT4_EXT_MAGIC);
> 
> Yeah, this patch improves things a lot.
> 
> How well-tested is it?  Have any of these errors actually been triggered?
> 
>>  path[i].p_hdr = ext_block_hdr(path[i].p_bh);
>> -if (ext4_ext_check_header(__FUNCTION__, inode,
>> -path[i].p_hdr)) {
>> -err = -EIO;
>> -goto out;
>> -}
>>  }
>>  
>> -BUG_ON(le16_to_cpu(path[i].p_hdr->eh_entries)
>> -   > le16_to_cpu(path[i].p_hdr->eh_max));
>> -BUG_ON(path[i].p_hdr->eh_magic != EXT4_EXT_MAGIC);
>> -
>>  if (!path[i].p_idx) {
>>  /* this level hasn't been touched yet */
>>  path[i].p_idx = EXT_LAST_INDEX(path[i].p_hdr);
>> @@ -1873,17 +1890,24 @@
>>  i, EXT_FIRST_INDEX(path[i].p_hdr),
>>  path[i].p_idx);
>>  if (ext4_ext_more_to_rm(path + i)) {
>> +struct buffer_head *bh;
>>  /* go to the next level */
>>  ext_debug("move to level %d (block %llu)\n",
>>i + 1, idx_pblock(path[i].p_idx));
>>  memset(path + i + 1, 0, sizeof(*path));
>> -path[i+1].p_bh =
>> -sb_bread(sb, idx_pblock(path[i].p_idx));
>> -if (!path[i+1].p_bh) {
>> +bh = sb_bread(sb, idx_pbloc

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-12 Thread Andy Whitcroft
The next version of checkpatch.pl (0.08) should have support for a
number of the missed sylistics you mention.  Will let them soak for a
bit to ensure we're not majorly regressing anything else.

-apw

ERROR: braces {} are not necessary for single statements
#4: FILE: Z11.c:1:
+if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) {

WARNING: declaring multiple variables together should be avoided
#9: FILE: Z11.c:6:
+unsigned int total_size, shift_bytes, temp = ~0U;

WARNING: multiple assignments should be avoided
#12: FILE: Z11.c:9:
+is->s.not_found = bs->s.not_found = -ENODATA;

WARNING: kfree(NULL) is safe this check is probabally not required
#16: FILE: Z11.c:13:
+if (bs)
+   kfree(bs);

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Random corruption test for e2fsck

2007-07-12 Thread Andi Kleen
On Wed, Jul 11, 2007 at 11:19:38PM -0600, Andreas Dilger wrote:
> On Jul 11, 2007  17:20 +0200, Andi Kleen wrote:
> > If you use a normal pseudo random number generator and print the seed
> > (e.g. create from the time) initially the image can be easily recreated 
> > later without shipping it around. /dev/urandom
> > is not really needed for this since you don't need cryptographic
> > strength randomness. Besides urandom data is precious and it's 
> > a pity to use it up needlessly.
> > 
> > bash has $RANDOM built in for this purpose.
> 
> Except it is a lot more efficient and easy to do

Ah you chose to only address one sentence in my reply.
I thought only Linus liked to to do that.

If you're worried about efficiency it's trivial to
write a C program that generates bulk pseudo random numbers using
random(3) 

> "dd if=/dev/urandom bs=1k ..." than to spin in a loop getting 16-bit
> random numbers from bash.  We would also be at the mercy of the shell
> being identical on the user and debugger's systems.

With /dev/urandom you have the guarantee you'll never ever reproduce
it again. 

Andrea A. used to rant about people who use srand(time(NULL)) 
in benchmarks and it's sad these mistakes get repeated again and again.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-12 Thread Amit K. Arora
On Thu, Jul 12, 2007 at 12:58:13PM +0530, Suparna Bhattacharya wrote:
> On Wed, Jul 11, 2007 at 10:03:12AM +0100, Christoph Hellwig wrote:
> > On Tue, Jul 03, 2007 at 05:16:50PM +0530, Amit K. Arora wrote:
> > > Well, if you see the modes proposed using above flags :
> > > 
> > > #define FA_ALLOCATE   0
> > > #define FA_DEALLOCATE FA_FL_DEALLOC
> > > #define FA_RESV_SPACE FA_FL_KEEP_SIZE
> > > #define FA_UNRESV_SPACE   (FA_FL_DEALLOC | FA_FL_KEEP_SIZE | 
> > > FA_FL_DEL_DATA)
> > > 
> > > FA_FL_DEL_DATA is _not_ being used for preallocation. We have two modes
> > > for preallocation FA_ALLOCATE and FA_RESV_SPACE, which do not use this
> > > flag. Hence prealloction will never delete data.
> > > This mode is required only for FA_UNRESV_SPACE, which is a deallocation
> > > mode, to support any existing XFS aware applications/usage-scenarios.
> > 
> > Sorry, but this doesn't make any sense.  There is no need to put every
> > feature in the XFS ioctls in the syscalls.  The XFS ioctls will need to
> > be supported forever anyway - as I suggested before they really should
> > be moved to generic code.
> > 
> > What needs to be supported is what makes sense as an interface.
> > A punch a hole interface does make sense, but trying to hack this into
> > a preallocation system call is just madness.  We're not IRIX or windows
> > that fit things into random subcall just because there was some space
> > left to squeeze them in.
> > 
> > > > > > FA_FL_NO_MTIME  0x10 /* keep same mtime (default change on 
> > > > > > size, data change) */
> > > > > > FA_FL_NO_CTIME  0x20 /* keep same ctime (default change on 
> > > > > > size, data change) */
> > > > 
> > > > NACK to these aswell.  If i_size changes c/mtime need updates, if the 
> > > > size
> > > > doesn't chamge they don't.  No need to add more flags for this.
> > > 
> > > This requirement was from the point of view of HSM applications. Hope
> > > you saw Andreas previous post and are keeping that in mind.
> > 
> > HSMs needs this basically for every system call, which screams for an
> > open flag like O_INVISIBLE anyway.  Adding this in a generic way is
> > a good idea, but hacking bits and pieces that won't fit into the global
> > design is completely wrong.
> 
> Why don't we just merge the interface for preallocation (essentially
> enough to satisfy posix_fallocate() and the simple XFS requirement for 
> space reservation without changing file size), which there is clear agreement
> on (I hope :)).  After all, this was all that we set out to do when we
> started.

As you suggest, let us just have two modes for the time being:

#define FALLOC_ALLOCATE 0x1
#define FALLOC_ALLOCATE_KEEP_SIZE   0x2

As the name suggests, when FALLOC_ALLOCATE_KEEP_SIZE mode is passed it
will result in file size not being changed even if the preallocation is
beyond EOF.

> And leave all the dealloc/punch/hsm type features for separate future patches/
> debates, those really shouldn't hold up the basic fallocate interface.

I agree.

> I agree with Christoph that we are just diverging too much in trying to
> club those decisions here.
> 
> Dave, Andreas, Ted ?
> 
> Regards
> Suparna

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-07-12 Thread Suparna Bhattacharya
On Wed, Jul 11, 2007 at 10:03:12AM +0100, Christoph Hellwig wrote:
> On Tue, Jul 03, 2007 at 05:16:50PM +0530, Amit K. Arora wrote:
> > Well, if you see the modes proposed using above flags :
> > 
> > #define FA_ALLOCATE 0
> > #define FA_DEALLOCATE   FA_FL_DEALLOC
> > #define FA_RESV_SPACE   FA_FL_KEEP_SIZE
> > #define FA_UNRESV_SPACE (FA_FL_DEALLOC | FA_FL_KEEP_SIZE | 
> > FA_FL_DEL_DATA)
> > 
> > FA_FL_DEL_DATA is _not_ being used for preallocation. We have two modes
> > for preallocation FA_ALLOCATE and FA_RESV_SPACE, which do not use this
> > flag. Hence prealloction will never delete data.
> > This mode is required only for FA_UNRESV_SPACE, which is a deallocation
> > mode, to support any existing XFS aware applications/usage-scenarios.
> 
> Sorry, but this doesn't make any sense.  There is no need to put every
> feature in the XFS ioctls in the syscalls.  The XFS ioctls will need to
> be supported forever anyway - as I suggested before they really should
> be moved to generic code.
> 
> What needs to be supported is what makes sense as an interface.
> A punch a hole interface does make sense, but trying to hack this into
> a preallocation system call is just madness.  We're not IRIX or windows
> that fit things into random subcall just because there was some space
> left to squeeze them in.
> 
> > > > > FA_FL_NO_MTIME0x10 /* keep same mtime (default change on 
> > > > > size, data change) */
> > > > > FA_FL_NO_CTIME0x20 /* keep same ctime (default change on 
> > > > > size, data change) */
> > > 
> > > NACK to these aswell.  If i_size changes c/mtime need updates, if the size
> > > doesn't chamge they don't.  No need to add more flags for this.
> > 
> > This requirement was from the point of view of HSM applications. Hope
> > you saw Andreas previous post and are keeping that in mind.
> 
> HSMs needs this basically for every system call, which screams for an
> open flag like O_INVISIBLE anyway.  Adding this in a generic way is
> a good idea, but hacking bits and pieces that won't fit into the global
> design is completely wrong.


Why don't we just merge the interface for preallocation (essentially
enough to satisfy posix_fallocate() and the simple XFS requirement for 
space reservation without changing file size), which there is clear agreement
on (I hope :)).  After all, this was all that we set out to do when we
started.

And leave all the dealloc/punch/hsm type features for separate future patches/
debates, those really shouldn't hold up the basic fallocate interface.
I agree with Christoph that we are just diverging too much in trying to
club those decisions here.

Dave, Andreas, Ted ?

Regards
Suparna

> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html